NAME
YAX::Parser - fast pure Perl tree and stream parser
SYNOPSIS
use
YAX::Parser;
my
$xml_str
=
<<XML
<?xml version="1.0" ?>
<doc>
<content id="42"><![CDATA[
This is a cdata section, so >>anything goes!<<
]]>
</content>
<!-- comments are nodes too -->
</doc>
XML
# tree parse - the common case
my
$xml_doc
= YAX::Parser->parse(
$xml_str
);
my
$xml_doc
= YAX::Parser->parse_file(
$path
);
# shallow parse
my
@tokens
= YAX::Parser->tokenize(
$xml_str
);
# stream parse
YAX::Parser->stream(
$xml_str
,
$state
,
%handlers
)
YAX::Parser->stream_file(
'/some/file.xml'
,
$state
,
%handlers
);
DESCRIPTION
This module implements a fast DOM and stream parser based on Robert D. Cameron's regular expression shallow parsing grammar and technique. It doesn't implement the full W3C DOM API by design. Instead, it takes a more pragmatic approach. DOM trees are constructed with everything being an object except for attributes, which are stored as a hash reference.
We also borrow some ideas from browser implementations, in particular, nodes are keyed in a table in the document on their id
attributes (if present) so you can say:
my
$found
=
$xml_doc
->get(
$node_id
);
Parsing is usually done by calling class methods on YAX::Parser, which, if invoked as a tree parser, returns an instance of YAX::Document
my
$xml_doc
= YAX::Parser->parse(
$xml_str
);
METHODS
See the "SYNOPSIS" for, here's just the list for now:
- parse( $xml_str )
-
Parse $xml_str and return a YAX::Document object.
- parse_file( $path )
-
Same as above by read the file at $path for the input.
- stream( $xml_str, $state, %handlers )
-
Although not its main focus, YAX::Parser also provides for stream parsing. It tries to be a bit more sane than Expat, in that it allows you to specify a state holder which can be anything and is passed as the first argument to the handler functions. A typical case is to use a hash reference with a stack (for tracking nesting):
my
$state
= {
stack
=> [ ] };
all handler functions are optional, but the full list is:
my
%handlers
= (
text
=> \
&handle_text
,
# called for text nodes
elmt
=> \
&handle_element_open
,
# called for open tags
elcl
=> \
&handle_element_close
,
# called for tag close
decl
=> \
&handle_declaration
,
# called for declarations
proc
=> \
&handle_proc_inst
,
# called for processing instructions
pass
=> \
&handle_passthrough
,
# called when no handlers match
);
an element handler is passed the state, tag name and attributes hash:
sub
handle_element_open {
my
(
$state
,
$name
,
%attributes
) =
@_
;
if
(
$name
eq
'a'
and
$attributes
{href} ) {
...
}
}
element close handlers take two arguments: state and tag name:
sub
handle_element_close {
my
(
$state
,
$name
) =
@_
;
die
"not well formed"
unless
pop
@{
$state
->{stack} } eq
$name
;
}
all other handlers take the state and the entire matched token
sub
handle_proc_inst {
my
(
$state
,
$token
) =
@_
;
$token
=~ /^<\?(.*?)\?>$/;
my
$instr
= $1;
...
}
- stream_file( $path, $state, %handlers )
-
Same as above by read the file at $path for the input.
- tokenize( $xml_str )
-
Useful for quick and dirty tokenizing of $xml_str. Returns a list of tokens.
SEE ALSO
LICENSE
This program is free software and may be modified and distributed under the same terms as Perl itself.
AUTHOR
Richard Hundt