NAME

Perl::Tokenizer - A tiny Perl code tokenizer.

VERSION

Version 0.02

SYNOPSIS

use Perl::Tokenizer qw(perl_tokens);
my $code = 'my $num = 42;';
perl_tokens { print "@_\n" } $code;

DESCRIPTION

Perl::Tokenizer is a tiny tokenizer which splits a given Perl code into a list of tokens, using the power of regular expressions.

SUBROUTINES

perl_tokens(&$)

This function takes a callback subroutine and a string. The subroutine is called for each token in real-time.

perl_tokens {
    my ($token, $pos_beg, $pos_end) = @_;
    ...
} $code;

The positions are absolute to the string.

EXPORT

Nothing is exported by default. Only the function perl_tokens() is exportable.

TOKENS

format

Format text.

heredoc_beg

The beginning of a here-document.

heredoc

The content of a here-document.

pod

POD content.

horizontal_space

Horizontal whitespace.

vertical_space

Vertical whitespace.

other_space

Other whitespace.

var_name

Variable name.

special_var_name

Special variable name.

sub_name

Subroutine name.

sub_proto

Prototype of a subroutine.

comment

Comment.

scalar_sigil

Scalar sigil. ($)

array_sigil

Array sigil. (@)

hash_sigil

Hash sigil. (%)

glob_sigil

Glob sigil. (*)

ampersand_sigil

Ampersand sigil. (&)

parenthesis_open

Open parenthesis. (()

parenthesis_close

Closed parenthesis. ())

curly_bracket_open

Open curly backet. ({)

curly_bracket_close

Closed curly bracket. (})

right_bracket_open

Open right bracket. ([)

right_bracket_close

Closed right bracket. (])

keyword

Perl keyword.

substitution

Regex substitution. (s///)

transliteration

Transliteration. (tr///)

match_regex

Match regex. (m//)

compiled_regex

Compiled regex. (qr//)

q_string

Single quoted string. (q//)

qq_string

Double quoted string. (qq//)

qw_string

Word quoted string. (qw//)

qx_string

Backtick quoted string. (qx//)

double_quoted_string

Double quoted string. ("")

single_quoted_string

Single quoted string. ('')

backtick

Backtick quoted string. (``)

bare_word

Unquoted string.

semicolon

End of statement. (;)

comma

Comma. (,)

fat_comma

Fat comma. (=>)

v_string

Version string. (vX or X.X.X)

file_test

File test operator. (-X)

data

DATA/END content.

special_keyword

Special keyword, such as __PACKAGE__, __FILE__, etc.

glob_readline

Glob/readline angle brackets. (<...>)

operator

Primitive operator, such as +, ||, etc.

assignment_operator

Assignment operator, such as +=, ||=, etc.

dereference_operator

The arrow dereference operator. (->)

hex_number

Hex number. (0x...)

binary_number

Binary number. (0b...)

number

Decimal number, such as 42, 3.14, etc.

special_fh

Special file-handle, such as STDIN, STDOUT, etc.

unknown_char

An unknown unexpected character.

EXAMPLE

For this code:

my $num = 42;

it creates the following tokens:

[ #  TOKEN                    POS
  { keyword              => [0, 2] },
  { horizontal_space     => [2, 3] },
  { scalar_sigil         => [3, 4] },
  { var_name             => [4, 7] },
  { horizontal_space     => [7, 8] },
  { operator             => [8, 9] },
  { horizontal_space     => [9, 10] },
  { number               => [10, 12] },
  { semicolon            => [12, 13] },
]

AUTHOR

Daniel "Trizen" Șuteu, <trizenx@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2013-2015

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.0 or, at your option, any later version of Perl 5 you may have available.