NAME
Perl::Tokenizer - A tiny Perl code tokenizer.
VERSION
Version 0.02
SYNOPSIS
use Perl::Tokenizer qw(perl_tokens);
my $code = 'my $num = 42;';
perl_tokens { print "@_\n" } $code;
DESCRIPTION
Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.
SUBROUTINES
- perl_tokens(&$)
-
This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.
perl_tokens { my ($token, $pos_beg, $pos_end) = @_; ... } $code;The positions are absolute to the string.
EXPORT
Nothing is exported by default. Only the function perl_tokens() is exportable.
TOKENS
- format
-
Format text.
- heredoc_beg
-
The beginning of a here-document.
- heredoc
-
The content of a here-document.
- pod
-
POD content.
- horizontal_space
-
Horizontal whitespace.
- vertical_space
-
Vertical whitespace.
- other_space
-
Other whitespace.
- var_name
-
Variable name.
- special_var_name
-
Special variable name.
- sub_name
-
Subroutine name.
- sub_proto
-
Prototype of a subroutine.
- comment
-
Comment.
- scalar_sigil
-
Scalar sigil. (
$) - array_sigil
-
Array sigil. (
@) - hash_sigil
-
Hash sigil. (
%) - glob_sigil
-
Glob sigil. (
*) - ampersand_sigil
-
Ampersand sigil. (
&) - parenthesis_open
-
Open parenthesis. (
() - parenthesis_close
-
Closed parenthesis. (
)) - curly_bracket_open
-
Open curly backet. (
{) - curly_bracket_close
-
Closed curly bracket. (
}) - right_bracket_open
-
Open right bracket. (
[) - right_bracket_close
-
Closed right bracket. (
]) - keyword
-
Perl keyword.
- substitution
-
Regex substitution. (
s///) - transliteration
-
Transliteration. (
tr///) - match_regex
-
Match regex. (
m//) - compiled_regex
-
Compiled regex. (
qr//) - q_string
-
Single quoted string. (
q//) - qq_string
-
Double quoted string. (
qq//) - qw_string
-
Word quoted string. (
qw//) - qx_string
-
Backtick quoted string. (
qx//) - double_quoted_string
-
Double quoted string. (
"") - single_quoted_string
-
Single quoted string. (
'') - backtick
-
Backtick quoted string. (
``) - bare_word
-
Unquoted string.
- semicolon
-
End of statement. (
;) - comma
-
Comma. (
,) - fat_comma
-
Fat comma. (
=>) - v_string
-
Version string. (
vXorX.X.X) - file_test
-
File test operator. (
-X) - data
-
DATA/END content.
- special_keyword
-
Special keyword, such as
__PACKAGE__,__FILE__, etc. - glob_readline
-
Glob/readline angle brackets. (
<...>) - operator
-
Primitive operator, such as
+,||, etc. - assignment_operator
-
Assignment operator, such as
+=,||=, etc. - dereference_operator
-
The arrow dereference operator. (
->) - hex_number
-
Hex number. (
0x...) - binary_number
-
Binary number. (
0b...) - number
-
Decimal number, such as
42,3.14, etc. - special_fh
-
Special file-handle, such as
STDIN,STDOUT, etc. - unknown_char
-
An unknown unexpected character.
EXAMPLE
For this code:
my $num = 42;
it creates the following tokens:
[ # TOKEN POS
{ keyword => [0, 2] },
{ horizontal_space => [2, 3] },
{ scalar_sigil => [3, 4] },
{ var_name => [4, 7] },
{ horizontal_space => [7, 8] },
{ operator => [8, 9] },
{ horizontal_space => [9, 10] },
{ number => [10, 12] },
{ semicolon => [12, 13] },
]
AUTHOR
Daniel "Trizen" Șuteu, <trizenx@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2015
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.