NAME
Text::Tokenizer - Perl extension for tokenizing text(config) files
SYNOPSIS
  use Text::Tokenizer ':all';
  #open file and set add it to tokenizer inputs
  open(F_CONFIG, "input.conf") || die("failed to open input.conf");
  $tok_id	= tokenizer_new(F_CONFIG);
  tokenizer_options(TOK_OPT_NOUNESCAPE|TOK_OPT_PASSCOMMENT);
  while(1)
  {
	($string, $tok_type, $line, $err, $errline)	= tokenizer_scan();
	last if($tok_type == TOK_ERROR || $tok_type == TOK_EOF);
	if($tok_type == TOK_TEXT)	{ 	}
	elsif($tok_type == TOK_BLANK)	{ 	}
	elsif($tok_type == TOK_DQUOTE)	{ $string	= "\"$str\"";	}
	elsif($tok_type == TOK_SQUOTE)	{ $string	= "\'$str\'";	}
	elsif($tok_type == TOK_SIQUOTE)	{ $string	= "\`$str\'";	}
	elsif($tok_type == TOK_IQUOTE)	{ $string	= "\`$str\`";	}
	elsif($tok_type == TOK_EOL)	{ $string	= "\n";		}
	elsif($tok_type == TOK_COMMENT)	{	}
	elsif($tok_type == TOK_UNDEF)
		{ last;	}
	else	{ last;	};
	print $string;
  }
  tokenizer_delete($tok_id);
  Very complex example of using Text::Tokenizer can be found in passwd_exp - tool for password
  expiration notification (http://devel.dob.sk/passwd_exp)
DESCRIPTION
Text::Tokenizer is very fast lexical analyzer, that can be used to process input text from file or buffer to basic tokens:
NORMAL TEXT
DOUBLE QUOTED "TEXT"
SINGLE QUOTED 'TEXT'
INVERSE QUOTED 'TEXT'
SINGLE-INVERSE QUOTED `TEXT'
WHITESPACE TEXT
#COMMENTS
END OF LINE
END OF FILE
EXPORT
None by default. You have to selectively import methods or constants or use ':all' to import all constants & methods.
CONSTANTS
TOKEN TYPES Token types that tokenizer returns.
- TOK_UNDEF
 - 
Undefined token (tokenizer error)
 - TOK_TEXT
 - 
Normal_text
 - TOK_DQUOTE
 - 
"Double quoted text"
 - TOK_SQUOTE
 - 
'Single quoted text'
 - TOK_IQUOTE
 - 
`Inverse quoted text`
 - TOK_SIQUOTE
 - 
`Single-inverse quoted text'
 - TOK_BLANK
 - 
Whitespace text
 - TOK_COMMENT
 - 
#Comment
 - TOK_EOL
 - 
End of Line
 - TOK_EOF
 - 
End of File
 - TOK_ERROR
 - 
Error Condition (see
ERROR_TYPES) 
ERROR TYPES Error codes that will tokenizer return if error happens.
- NOERR
 - 
No error
 - UNCLOSED_DQUOTE
 - 
Unclosed double quote found
 - UNCLOSED_SQUOTE
 - 
Unclosed single quote found
 - UNCLOSED_IQUOTE
 - 
Unclosed inverse quote found
 - NOCONTEXT
 - 
Failed to allocate tokenizer context (FATAL ERROR)
 
TOKENIZER OPTIONS Options configurable for tokenizer. They should be OR-ed when passing to tokenizer_options.
- TOK_OPT_DEFAULT
 - 
Default options set, equals to TOK_OPT_NOUNESCAPE
 - TOK_OPT_NONE
 - 
Set no options. Tokenizer will do in it's default behaviour - it will not unescape anything and it will not pass comments to you.
 - TOK_OPT_NOUNESCAPE
 - 
Disable characters & lines unescaping.
 - TOK_OPT_SIQUOTE
 - 
Enable looking for `single-inverse quote' combination.
 - TOK_OPT_UNESCAPE
 - 
Unescape chars & lines.
 - TOK_OPT_UNESCAPE_CHARS
 - 
Unescape chars (inside of quotes only)
 - TOK_OPT_UNESCAPE_LINES
 - 
Unescape lines (inside of quotes only)
 - TOK_OPT_PASSCOMMENT
 - 
Enable comment passing to user routines.
 - TOK_OPT_UNESCAPE_NQ_LINES
 - 
Unescape lines (outside of quotes). Escaped end of line will not terminate value processing processing. So escaped multiline text will be returned as single line string.
 
METHODS
- $options = tokenizer_options(OPTIONS)
 - 
Set tokenizer options.
 - $tok_id = tokenizer_new(FILE_HANDLE)
 - 
Create new tokenizer instance(context) from FILE_HANDLE identified by $tok_id.
 - $tok_id = tokenizer_new_strbuf(BUFFER, LENGTH)
 - 
Create new tokenizer instance from string BUFFER long LENGTH characters. Return its tokenizer instance id.
 - @tok = tokenizer_scan()
 - 
Scan current tokenizer instance, and return first token found. @tok = ($string, $type, $line, $error, $error_line)
 - tokenizer_exists(TOK_ID)
 - 
Test if tokenizer instance exists.
 - tokenizer_switch(TOK_ID)
 - 
Switch to another tokenizer instance (like when you perform include statement).
 - tokenizer_delete(TOK_ID)
 - 
Delete tokenizer instance You have to do it exactly on EOF to release tokenizer reference to file or buffer.
 - tokenizer_flush(TOK_ID)
 - 
Flush tokenizer instance. This function discards the instance buffer\s contents, so the next time the scanner attempts to match a token from the buffer, it will have to fill it.
 
SEE ALSO
This tokenizer is based on code generated by flex - fast lexical analyzer generator (http://lex.sourceforge.net).
AUTHOR
Samuel Behan, (http://devel.dob.sk)
COPYRIGHT AND LICENSE
Copyright 2003-2011 by Samuel Behan
This library is free software; you can redistribute it and/or modify it under the terms of GNU/GPL v3.