NAME

Perl::Tokenizer - A tiny Perl code tokenizer.

VERSION

Version 0.06

SYNOPSIS

use Perl::Tokenizer;
my $code = 'my $num = 42;';
perl_tokens { print "@_\n" } $code;

DESCRIPTION

Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.

SUBROUTINES

perl_tokens(&$)

This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.

perl_tokens {
    my ($token, $pos_beg, $pos_end) = @_;
    ...
} $code;

The positions are absolute to the string.

EXPORT

The function perl_tokens is exported by default. This is the only function provided by this module.

TOKENS

The standard token names that are available are:

format .................. Format text
heredoc_beg ............. The beginning of a here-document ('<<"EOT"')
heredoc ................. The content of a here-document
pod ..................... An inline POD document, until '=cut' or end of the file
horizontal_space ........ Horizontal whitespace (matched by /\h/)
vertical_space .......... Vertical whitespace (matched by /\v/)
other_space ............. Whitespace that is neither vertical nor horizontal (matched by /\s/)
var_name ................ Alphanumeric name of a variable (excluding the sigil)
special_var_name ........ Non-alphanumeric name of a variable, such as $/ or $^H (excluding the sigil)
sub_name ................ Subroutine name
sub_proto ............... Subroutine prototype
comment ................. A #-to-newline comment (excluding the newline)
scalar_sigil ............ The sigil of a scalar variable: '$'
array_sigil ............. The sigil of an array variable: '@'
hash_sigil .............. The sigil of a hash variable: '%'
glob_sigil .............. The sigil of a glob symbol: '*'
ampersand_sigil ......... The sigil of a subroutine call: '&'
parenthesis_open ........ Open parenthesis: '('
parenthesis_close ....... Closed parenthesis: ')'
right_bracket_open ...... Open right bracket: '['
right_bracket_close ..... Closed right bracket: ']'
curly_bracket_open ...... Open curly bracket: '{'
curly_bracket_close ..... Closed curly bracket: '}'
substitution ............ Regex substitution: s/.../.../
transliteration.......... Transliteration: tr/.../.../' or y/.../.../
match_regex ............. A regex in matching context: m/.../
compiled_regex .......... A quoted 'compiled' regex: qr/.../
q_string ................ A single quoted string: q/.../
qq_string ............... A double quoted string: qq/.../
qw_string ............... A list of quoted strings: qw/.../
qx_string ............... A system command quoted string: qx/.../
backtick ................ A backtick system command quoted string: `...`
single_quoted_string .... A single quoted string, as: '...'
double_quoted_string .... A double quoted string, as: "..."
bare_word ............... An unquoted string
glob_readline ........... A <readline> or <shell glob>
v_string ................ A version string: "vX" or "X.X.X"
file_test ............... A file test operator (-X), such as: "-d", "-e", etc...
data .................... The content of `__DATA__` or `__END__` sections
keyword ................. A regular Perl keyword, such as: `if`, `else`, etc...
special_keyword ......... A special Perl keyword, such as: `__PACKAGE__`, `__FILE__`, etc...
comma ................... A comma: ','
fat_comma ............... A fat comma: '=>'
operator ................ A primitive operator, such as: '+', '||', etc...
assignment_operator ..... A '=' or any operator assignment: '+=', '||=', etc...
dereference_operator .... The arrow dereference operator: '->'
hex_number .............. An hexadecimal literal number: 0x...
binary_number ........... An binary literal number: 0b...
number .................. An decimal literal number, such as 42, 3.1e4, etc...
special_fh .............. A special file-handle name, such as 'STDIN', 'STDOUT', etc...
unknown_char ............ An unknown or unexpected character

EXAMPLE

For this code:

my $num = 42;

it generates the following tokens:

#  TOKEN                     POS
( keyword              => ( 0,  2) )
( horizontal_space     => ( 2,  3) )
( scalar_sigil         => ( 3,  4) )
( var_name             => ( 4,  7) )
( horizontal_space     => ( 7,  8) )
( assignment_operator  => ( 8,  9) )
( horizontal_space     => ( 9, 10) )
( number               => (10, 12) )
( semicolon            => (12, 13) )

REPOSITORY

https://github.com/trizen/Perl-Tokenizer

AUTHOR

Daniel "Trizen" Șuteu, <trizenx@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2017

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.