NAME

C::Tokenize - reduce a C file to a series of tokens

SYNOPSIS

# Remove all C preprocessor instructions from a C program:
use C::Tokenize '$cpp_re';
$c =~ s/$cpp_re//g;

# Print all the comments in a C program:
use C::Tokenize '$comment_re';
while ($c =~ /($comment_re)/) {
    print "$1\n";
}

DESCRIPTION

This module provides a tokenizer which breaks C source code into its smallest meaningful components, and the regular expressions which match each of these components. For example, the module supplies a regular expression "$comment_re" which matches a C comment line.

REGULAR EXPRESSIONS

The following regular expressions can be imported from this module using, for example,

use C::Tokenize '$cpp_re'

to import $cpp_re.

None of the following regular expressions does any capturing. If you want to capture, add your own parentheses around the regular expression.

$trad_comment_re: Match /* */ comments.
$cxx_comment_re: Match // comments.
$comment_re: Match both /* */ and // comments.
$cpp_re: Match a C preprocessor instruction.
$char_const_re: Match a character constant, such as 'a' or '\-'.
$operator_re: Match an operator such as + or --.
$number_re: Match a number, either integer, floating point, or hexadecimal. Does not do octal yet.
$word_re: Match a word, such as a function or variable name or a keyword of the language.
$grammar_re: Match other syntactic characters such as { or [.
$single_string_re: Match a single C string constant such as "this".
$string_re: Match a full-blown C string constant, including compound strings "like" "this".
$reserved_re: Match a C reserved word like auto or goto.

VARIABLES

@fields

@Fields contains a list of all the fields which are extracted by "tokenize".

FUNCTIONS

decomment

my $out = decomment ('/* comment */');
# $out = " comment ";

Remove the traditional C comment marks /* and */ from the beginning and end of a string, leaving only the comment contents. The string has to begin and end with comment marks.

tokenize

my $tokens = tokenize ($file);

Convert $file into a series of tokens. The return value is an array reference which contains hash references. Each hash reference corresponds to one token in the C file. Each token contains the following keys:

leading

Any whitespace which comes before the token (called "leading whitespace").

type

The type of the token, which may be

comment

A comment, like

/* This */

// this.

cpp

A C preprocessor instruction like

#define THIS 1

#include "That.h".

char_const

A character constant, like '\0' or 'a'.

grammar

A piece of C "grammar", like { or ] or ->.

number

A number such as 42,

word

A word, which may be a variable name or a function.

string

A string, like "this", or even "like" "this".

reserved

A C reserved word, like auto or goto.

All of the fields which may be captured are available in the variable "@fields" which can be exported from the module:

use C::Tokenize '@fields';

$name

The value of the type. For example, if $token->{name} equals 'comment', then the value of the type is in , $token->{comment}.

if ($token->{name} eq 'string') {
    my $c_string = $token->{string};
}

line

The line number of the C file where the token occured. For a multi-line comment or preprocessor instruction, the line number refers to the final line.

EXPORTS

use C::Tokenize ':all';

exports all the regular expressions from the module.

BUGS

Octal not parsed: It does not parse octal expressions.
No trigraphs: No handling of trigraphs.
Requires Perl 5.10: This module uses named captures in regular expressions, so it requires Perl 5.10 or more.
No line directives: The line numbers provided by "tokenize" do not respect C line directives.
Insufficient tests: The module has been used somewhat, but the included tests do not exercise many of the features of C.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.

To install C::Tokenize, copy and paste the appropriate command in to your terminal.

cpanm

cpanm C::Tokenize

CPAN shell

perl -MCPAN -e shell
install C::Tokenize

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)