NAME

C::Tokenize - reduce a C file to a series of tokens

REGULAR EXPRESSIONS

The regular expressions can be imported using, for example,

use C::Tokenize '$cpp_re'

to import $cpp_re.

None of the regular expressions does any capturing. If you want to capture, add your own parentheses around the regular expression.

$trad_comment_re

Match /* */ comments.

$cxx_comment_re

Match // comments.

$comment_re

Match both /* */ and // comments.

$cpp_re

Match a C preprocessor instruction.

$char_const_re

Match a character constant, such as 'a' or '\-'.

$operator_re

Match an operator such as + or --.

$number_re

Match a number, either integer, floating point, or hexadecimal. Does not do octal yet.

$word_re

Match a word, such as a function or variable name or a keyword of the language.

$grammar_re

Match other syntactic characters such as { or [.

$single_string_re

Match a single C string constant such as "this".

$string_re

Match a full-blown C string constant, including compound strings "like" "this".

FUNCTIONS

decomment

my $out = decomment ('/* comment */');
# $out = " comment ";

Remove the comments from a string.

tokenize

my $tokens = tokenize ($file);

Convert $file into a series of tokens. The return value is an array reference which contains hash references. Each hash reference corresponds to one token in the C file. Each token contains the following keys:

leading

Any whitespace which comes before the token (called "leading whitespace").

name

The name of the token, which may be

comment

A comment, like

/* This */

or

// this.
cpp

A C preprocessor instruction like

#define THIS 1

or

#include "That.h".
char_const

A character constant, like '\0' or 'a'.

grammar

A piece of C "grammar", like { or ] or ->.

number

A number such as 42,

word

A word, which may be a variable name or a function.

string

A string, like "this", or even "like" "this".

$name

The value of the type. For example, if $token->{name} equals 'comment', then the value of the type is in , $token->{comment}.

if ($token->{name} eq 'string') {
    my $c_string = $token->{string};
}

BUGS

Octal not parsed

It does not parse octal expressions.

No trigraphs

No handling of trigraphs.

Requires Perl 5.10

This module uses named captures in regular expressions, so it requires Perl 5.10 or more.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2012 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.