NAME
String::Tokeniser - Perl extension for, uhm, tokenising strings.
SYNOPSIS
use String::Tokeniser;
DESCRIPTION
String::Tokeniser
provides an interface to a tokeniser class, allowing one to manipulate strings on a token-by-token basis without having to keep track of list element numbers and so on.
CONSTRUCTOR
- new ( $sentence, [0|-1|$regexp], [$exception...] )
-
Create a
String::Tokeniser
, tokenises $sentence and resets the token counter.The next argument determines how a ``token'' is defined: a value of 0 or
undef
determines that underscores are included in a token; -1 states that they are not. Alternatively, you can supply your own regular expression which will be fed to asplit
to determine the tokens.Then may optionally follow a list of exceptions: tokens that would be split in two, but should be treated as one.
METHODS
- moretokens
-
Tells you if you have any more tokens left to deal with.
- skiptoken([n])
-
Move the `pointer' forward one (or
n
) tokens. - thistoken
-
Return the current token; that is, the token under the `pointer'.
- lasttoken
-
Return the previous token; that is, the one just past the `pointer'.
- gettoken
-
Equivalent to
skiptoken;gettoken
- the usual way of grabbing the next token in the list in turn. - nexttoken
-
Looks ahead one token, but does not change the `pointer' position.
- lookahead([n])
-
Returns a string composed of the next
n
tokens, but does not change the `pointer' position. - gimme($string)
-
Assuming a string of tokens will end in
$string
, returns everything from the current `pointer' position until the string is found. Returns a two-element list: firsly, why the search terminated, (eitherEOF
meaning we hit the end of the token list without success, orFOUND
meaning$string
was found.) and the rest of the tokens upto and including$string
(or the end of the list, whichever was soonest). - save
-
Saves one's pointer position. Can be used multiply as a save stack.
- restore
-
Restores a previously saved position.
FEATURES
At present, there is no support for exceptions which spread over three or more tokens, although this is planned.
AUTHOR
Originaly written by Simon Cozens; Maintained by Alberto Simoes <ambs@cpan.org
>