NAME
YAPE::Regex - Yet Another Parser/Extractor for Regular Expressions
SYNOPSIS
use YAPE::Regex;
use strict;
my $regex = qr/reg(ular\s+)?exp?(ression)?/i;
my $parser = YAPE::Regex->new($regex);
# here is the tokenizing part
while (my $chunk = $parser->next) {
# ...
}
YAPE
MODULES
The YAPE
hierarchy of modules is an attempt at a unified means of parsing and extracting content. It attempts to maintain a generic interface, to promote simplicity and reusability. The API is powerful, yet simple. The modules do tokenization (which can be intercepted) and build trees, so that extraction of specific nodes is doable.
DESCRIPTION
This module is yet another (?) parser and tree-builder for Perl regular expressions. It builds a tree out of a regex, but at the moment, the extent of the extraction tool for the tree is quite limited (see "Extracting Sections"). However, the tree can be useful to extension modules.
USAGE
In addition to the base class, YAPE::Regex
, there is the auxiliary class YAPE::Regex::Element
(common to all YAPE
base classes) that holds the individual nodes' classes. There is documentation for the node classes in that module's documentation.
Methods for YAPE::Regex
use YAPE::Regex;
use YAPE::Regex qw( MyExt::Mod );
If supplied no arguments, the module is loaded normally, and the node classes are given the proper inheritence (from
YAPE::Regex::Element
). If you supply a module (or list of modules),import
will automatically include them (if needed) and set up their node classes with the proper inheritence -- that is, it will appendYAPE::Regex
to@MyExt::Mod::ISA
, andYAPE::Regex::xxx
to each node class's@ISA
(wherexxx
is the name of the specific node class).package MyExt::Mod; use YAPE::Regex 'MyExt::Mod'; # @MyExt::Mod::ISA = 'YAPE::Regex' # @MyExt::Mod::text::ISA = 'YAPE::Regex::text' # ...
my $p = YAPE::Regex->new($REx);
Creates a
YAPE::Regex
object, using the contents of$REx
as a regular expression. Thenew
method will attempt to convert$REx
to a compiled regex (usingqr//
) if$REx
isn't already one. If there is an error in the regex, this will fail, but the parser will pretend it was ok. It will then report the bad token when it gets to it, in the course of parsing.my $text = $p->chunk($len);
Returns the next
$len
characters in the input string;$len
defaults to 30 characters. This is useful for figuring out why a parsing error occurs.my $done = $p->done;
Returns true if the parser is done with the input string, and false otherwise.
my $errstr = $p->error;
Returns the parser error message.
my $backref = $p->extract;
Returns a code reference that returns the next back-reference in the regex. For more information on enhancements in upcoming versions of this module, check "Extracting Sections".
my $node = $p->display(...);
Returns a string representation of the entire content. It calls the
parse
method in case there is more data that has not yet been parsed. This calls thefullstring
method on the root nodes. Check theYAPE::Regex::Element
docs on the arguments tofullstring
.my $node = $p->next;
Returns the next token, or
undef
if there is no valid token. There will be an error message (accessible with theerror
method) if there was a problem in the parsing.my $node = $p->parse;
Calls
next
until all the data has been parsed.my $node = $p->root;
Returns the root node of the tree structure.
my $state = $p->state;
Returns the current state of the parser. It is one of the following values:
alt
,anchor
,any
,backref
,capture(N)
,class
,close
,code
,comment
,cond(TYPE)
,ctrl
,cut
,done
,error
,flags
,group
,hex
,later
,lookahead(neg|pos)
,lookbehind(neg|pos)
,macro
,oct
,slash
, andtext
.For
capture(N)
, N will be the number the captured pattern represents.For
cond(TYPE)
, TYPE will either be a number representing the back-reference that the conditional depends on, or the stringassert
.For
lookahead
andlookbehind
, one ofneg
andpos
will be there, depending on the type of assertion.my $node = $p->top;
Synonymous to
root
.
Extracting Sections
While extraction of nodes is the goal of the YAPE
modules, the author is at a loss for words as to what needs to be extracted from a regex. At the current time, all the extract
method does is allow you access to the regex's set of back-references:
my $extor = $parser->extract;
while (my $backref = $extor->()) {
# ...
}
japhy
is very open to suggestions as to the approach to node extraction (in how the API should look, in addition to what should be proffered). Preliminary ideas include extraction keywords like the output of -Dr (or the re
module's debug
option).
The YAPE::Regex::Wasted
extension module, which suggests that regexes like /(.*?):/
be changed to /([^:]*):/
(and their ilk), could make use of an extraction technique that lets the user detect a node of .*?
followed by a constant string or character class.
EXTENSIONS
YAPE::Regex::Explain
2.00Presents an explanation of a regular expression, node by node.
YAPE::Regex::Reverse
(Not released)Reverses the nodes of a regular expression.
YAPE::Regex::Wasted
(Not released)Points out wasted
/s
and/m
modifiers, and tries to suggest replacements for.*?
nodes.
TO DO
This is a listing of things to add to future versions of this module.
API
Create a robust
extract
methodOpen to suggestions.
Internals
Add Perl 5.6 character class support
The new character class syntaces,
[:posix:]
and\p{UniCode}
, aren't yet supported. These might beclass
objects, or have their own classes (posix_class
andunicode_class
).
BUGS
Following is a list of known or reported bugs.
Pending
NONE!
SUPPORT
Visit YAPE
's web site at http://www.pobox.com/~japhy/YAPE/.
SEE ALSO
The YAPE::Regex::Element
documentation, for information on the node classes. Also, Text::Balanced
, Damian Conway's excellent module, used for
AUTHOR
Jeff "japhy" Pinyan
CPAN ID: PINYAN
japhy@pobox.com
http://www.pobox.com/~japhy/
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 783:
You forgot a '=back' before '=head1'