NAME

Text::Mint::Tokenizer - Turn files into a stream of tokens

VERSION

Version 1.0

SYNOPSIS

This module was written as part of Text::Mint and is meant to be used from within it, but for anyone desiring to use it as a general-purpose tokenizer:

use Text::Mint::Tokenizer;

my $t = Text::Mint::Tokenizer->new( files => \@filelist );

# fetch a single token
my $token = $t->next;
# fetch a token in verbatim mode
$token = $t->vnext;
# send last parsed token again
$lasttoken = $t->resend;

# the lexxer has thrown an error. get file info for diagnostic
# message to user
my ($file,$linenum,$firstpart,$lastpart,$token) = $t->stat;

T::M::T doesn't perform any kind of lexical analysis; it is strictly a tokenizer. Its definitions of "token" are not configurable. It does, however, have nifty transparent filehandling capabilities and two ways of thinking about "a token" to facilitate text-processing needs.

When the end of the token stream is reached, undef will be returned by a call to next or vnext.

METHODS

new

Create a new Tokenizer object. The only argument to new is an arrayref containing the names of files to be tokenized. If this argument is omitted, the call to new will return a value of 1 (indicating failure) instead of an object.

next

The usual way of getting a token. Returns the next token from the present file, where "token" is defined as "string of contiguous nonwhitespace characters". Returns undef if a token cannot be read (usually when the end of the last input file has been reached).

vnext

Works like next but considers leading whitespace and/or a trailing newline to be part of the token. This is called when the lexxer is inside a verbatim text trigger or quoted string. Returns undef if a token cannot be read (usually when the end of the last input file has been reached).

resend

Returns the previously-parsed (current) token.

stat

Returns the name of the current open file, the number of the last line read from that file, the current line fragment up to the last token, the current line fragment following the last token, and the last token parsed from the current line.

INTERNAL METHODS

_lf

Internal use only. Fetches next line of input. Returns zero on success, nonzero on failure.

_fopen

Internal use only. Opens the next input file. Returns zero on success, nonzero on failure.

AUTHOR

Shawn Boyette, <mdxi@cpan.org>

COPYRIGHT & LICENSE

Copyright 2004 Shawn Boyette, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.