NAME
Perl::Tokenizer - A tiny Perl code tokenizer.
VERSION
Version 0.02
SYNOPSIS
use Perl::Tokenizer qw(perl_tokens);
my $code = 'my $num = 42;';
perl_tokens { print "@_\n" } $code;
DESCRIPTION
Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.
SUBROUTINES
- perl_tokens(&$)
-
This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.
perl_tokens { my ($token, $pos_beg, $pos_end) = @_; ... } $code;
The positions are absolute to the string.
EXPORT
Nothing is exported by default. Only the function perl_tokens() is exportable.
TOKENS
- format
-
Format text.
- heredoc_beg
-
The beginning of a here-document.
- heredoc
-
The content of a here-document.
- pod
-
POD content.
- horizontal_space
-
Horizontal whitespace.
- vertical_space
-
Vertical whitespace.
- other_space
-
Other whitespace.
- var_name
-
Variable name.
- special_var_name
-
Special variable name.
- sub_name
-
Subroutine name.
- sub_proto
-
Prototype of a subroutine.
- comment
-
Comment.
- scalar_sigil
-
Scalar sigil. (
$
) - array_sigil
-
Array sigil. (
@
) - hash_sigil
-
Hash sigil. (
%
) - glob_sigil
-
Glob sigil. (
*
) - ampersand_sigil
-
Ampersand sigil. (
&
) - parenthesis_open
-
Open parenthesis. (
(
) - parenthesis_close
-
Closed parenthesis. (
)
) - curly_bracket_open
-
Open curly backet. (
{
) - curly_bracket_close
-
Closed curly bracket. (
}
) - right_bracket_open
-
Open right bracket. (
[
) - right_bracket_close
-
Closed right bracket. (
]
) - keyword
-
Perl keyword.
- substitution
-
Regex substitution. (
s///
) - transliteration
-
Transliteration. (
tr///
) - match_regex
-
Match regex. (
m//
) - compiled_regex
-
Compiled regex. (
qr//
) - q_string
-
Single quoted string. (
q//
) - qq_string
-
Double quoted string. (
qq//
) - qw_string
-
Word quoted string. (
qw//
) - qx_string
-
Backtick quoted string. (
qx//
) - double_quoted_string
-
Double quoted string. (
""
) - single_quoted_string
-
Single quoted string. (
''
) - backtick
-
Backtick quoted string. (
``
) - bare_word
-
Unquoted string.
- semicolon
-
End of statement. (
;
) - comma
-
Comma. (
,
) - fat_comma
-
Fat comma. (
=>
) - v_string
-
Version string. (
vX
orX.X.X
) - file_test
-
File test operator. (
-X
) - data
-
DATA/END content.
- special_keyword
-
Special keyword, such as
__PACKAGE__
,__FILE__
, etc. - glob_readline
-
Glob/readline angle brackets. (
<...>
) - operator
-
Primitive operator, such as
+
,||
, etc. - assignment_operator
-
Assignment operator, such as
+=
,||=
, etc. - dereference_operator
-
The arrow dereference operator. (
->
) - hex_number
-
Hex number. (
0x...
) - binary_number
-
Binary number. (
0b...
) - number
-
Decimal number, such as
42
,3.14
, etc. - special_fh
-
Special file-handle, such as
STDIN
,STDOUT
, etc. - unknown_char
-
An unknown unexpected character.
EXAMPLE
For this code:
my $num = 42;
it creates the following tokens:
[ # TOKEN POS
{ keyword => [0, 2] },
{ horizontal_space => [2, 3] },
{ scalar_sigil => [3, 4] },
{ var_name => [4, 7] },
{ horizontal_space => [7, 8] },
{ operator => [8, 9] },
{ horizontal_space => [9, 10] },
{ number => [10, 12] },
{ semicolon => [12, 13] },
]
AUTHOR
Daniel "Trizen" Șuteu, <trizenx@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2015
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.