NAME
Perl::Tokenizer - A tiny Perl code tokenizer.
VERSION
Version 0.11
SYNOPSIS
DESCRIPTION
Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.
SUBROUTINES
- perl_tokens(&$)
-
This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.
perl_tokens {
my
(
$token
,
$pos_beg
,
$pos_end
) =
@_
;
...
}
$code
;
The positions are absolute to the string.
EXPORT
The function perl_tokens is exported by default. This is the only function provided by this module.
TOKENS
The standard token names that are available are:
format
.................. Format text
heredoc_beg ............. The beginning of a here-document (
'<<"EOT"'
)
heredoc ................. The content of a here-document
pod ..................... An inline POD document,
until
'=cut'
or end of the file
horizontal_space ........ Horizontal whitespace (matched by /\h/)
vertical_space .......... Vertical whitespace (matched by /\v/)
other_space ............. Whitespace that is neither vertical nor horizontal (matched by /\s/)
var_name ................ Alphanumeric name of a variable (excluding the sigil)
special_var_name ........ Non-alphanumeric name of a variable, such as $/ or $^H (excluding the sigil)
sub_name ................ Subroutine name
sub_proto ............... Subroutine
prototype
comment ................. A
#-to-newline comment (excluding the newline)
scalar_sigil ............ The sigil of a
scalar
variable:
'$'
array_sigil ............. The sigil of an array variable:
'@'
hash_sigil .............. The sigil of a hash variable:
'%'
glob_sigil .............. The sigil of a
glob
symbol:
'*'
ampersand_sigil ......... The sigil of a subroutine call:
'&'
parenthesis_open ........ Open parenthesis:
'('
parenthesis_close ....... Closed parenthesis:
')'
right_bracket_open ...... Open right bracket:
'['
right_bracket_close ..... Closed right bracket:
']'
curly_bracket_open ...... Open curly bracket:
'{'
curly_bracket_close ..... Closed curly bracket:
'}'
substitution ............ Regex substitution: s/.../.../
transliteration.......... Transliteration:
tr
/.../.../ or y/.../.../
match_regex ............. Regex in matching context: m/.../
compiled_regex .......... Quoted compiled regex:
qr/.../
q_string ................ Single quoted string:
q/.../
qq_string ............... Double quoted string:
qq/.../
qw_string ............... List of quoted words:
qw/.../
qx_string ............... System command quoted string:
qx/.../
backtick ................ Backtick
system
command quoted string: `...`
single_quoted_string .... Single quoted string, as:
'...'
double_quoted_string .... Double quoted string, as:
"..."
bare_word ............... Unquoted string
glob_readline ........... <
readline
> or <shell
glob
>
v_string ................ Version string:
"vX"
or
"X.X.X"
file_test ............... File test operator (-X), such as:
"-d"
,
"-e"
, etc...
data .................... The content of `__DATA__` or `__END__` sections
keyword ................. Regular Perl keyword, such as: `
if
`, `
else
`, etc...
special_keyword ......... Special Perl keyword, such as: `__PACKAGE__`, `__FILE__`, etc...
comma ................... Comma:
','
fat_comma ............... Fat comma:
'=>'
operator ................ Primitive operator, such as:
'+'
,
'||'
, etc...
assignment_operator .....
'='
or any assignment operator:
'+='
,
'||='
, etc...
dereference_operator .... Arrow dereference operator:
'->'
hex_number .............. Hexadecimal literal number: 0x...
binary_number ........... Binary literal number: 0b...
number .................. Decimal literal number, such as 42, 3.1e4, etc...
special_fh .............. Special file-handle name, such as
'STDIN'
,
'STDOUT'
, etc...
unknown_char ............ Unknown or unexpected character
EXAMPLE
For this code:
my
$num
= 42;
it generates the following tokens:
# TOKEN POS
(
keyword
=> ( 0, 2) )
(
horizontal_space
=> ( 2, 3) )
(
scalar_sigil
=> ( 3, 4) )
(
var_name
=> ( 4, 7) )
(
horizontal_space
=> ( 7, 8) )
(
assignment_operator
=> ( 8, 9) )
(
horizontal_space
=> ( 9, 10) )
(
number
=> (10, 12) )
(
semicolon
=> (12, 13) )
REPOSITORY
https://github.com/trizen/Perl-Tokenizer
AUTHOR
Daniel Șuteu, <trizen at cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2013-2017 Daniel Șuteu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.