NAME
XML::Reader - Reading XML and providing path information based on a pull-parser.
SYNOPSIS
use XML::Reader;
my $text = '<root>stu<test param="v">w</test>xyz</root>';
my $rdr = XML::Reader->new(\$text) or die "Error: $!";
while ($rdr->iterate) {
print "Path = ", $rdr->path, ", Value = ", $rdr->value, "\n";
}
DESCRIPTION
XML::Reader provides an easy to use and simple interface for sequentially parsing XML files (so called "pull-mode" parsing) and at the same time keeps track of the complete XML-path.
It was developped as a thin wrapper on top of XML::TokeParser. XML::TokeParser allows pull-mode parsing, but does not keep track of the complete XML-Path. Also, the interface to XML::TokeParser (see $t->is_start_tag, $t->is_end_tag, $t->is_text) requires you to distinguish between start-tags, end-tags and text, which, in my view, complicates the interface.
There is also XML::TiePYX, which lets you pull-mode parse XML-Files (see http://www.xml.com/pub/a/2000/03/15/feature/index.html for an introduction to PYX). But still, with XML::TiePYX you need to account for start-tags, end-tags and text, and it does not provide the full XML-path.
By contrast, XML::Reader translates start-tags, end-tags and text into XPath-like expressions. So you don't need to worry about tags, you just get a path and a value, and that's it.
For example, the following XML...
<data>
<item>abc</item>
<item>
<dummy/>
fgh
<inner name="ttt" id="fff">
ooo <!-- comment --> ppp
</inner>
</item>
</data>
...corresponds to a sequence of path/value pairs.
You can also keep track of the start- and end-tags: There is a method is_start
which returns 1 or 0, depending on whether the XML-file had a start tag at the current position. There is also the equivalent method is_end
. Just remember, those two method only make sense if filter is switched off (otherwise those methods return constant 0). Finally, there is the method tag
which gives you the current tag-name (or attribute-name).
Here is the sequence of path/value pairs, including is_start
, is_end
and tag
:
path = '/data' value = '' is_start = 1 is_end = 0 tag = 'data'
path = '/data/item' value = 'abc' is_start = 1 is_end = 1 tag = 'item'
path = '/data' value = '' is_start = 0 is_end = 0 tag = 'data'
path = '/data/item' value = '' is_start = 1 is_end = 0 tag = 'item'
path = '/data/item/dummy' value = '' is_start = 1 is_end = 1 tag = 'dummy'
path = '/data/item' value = 'fgh' is_start = 0 is_end = 0 tag = 'item'
path = '/data/item/inner' value = '' is_start = 1 is_end = 0 tag = 'inner'
path = '/data/item/inner/@id' value = 'fff' is_start = 0 is_end = 0 tag = 'id'
path = '/data/item/inner/@name' value = 'ttt' is_start = 0 is_end = 0 tag = 'name'
path = '/data/item/inner' value = 'ooo' is_start = 0 is_end = 0 tag = 'inner'
path = '/data/item/inner/#' value = 'comment' is_start = 0 is_end = 0 tag = ''
path = '/data/item/inner' value = 'ppp' is_start = 0 is_end = 1 tag = 'inner'
path = '/data/item' value = '' is_start = 0 is_end = 1 tag = 'item'
path = '/data' value = '' is_start = 0 is_end = 1 tag = 'data'
INTERFACE
Object creation
To create an XML::Reader object, the following syntax is used:
my $rdr = XML::Reader->new($data, {comment => 0, strip => 1, filter => 1})
or die "Error: $!";
The element $data
(which is mandatory) is either the name of the XML-file, or a reference to a string, in which case the content of that string is taken as the text of the XML.
Here is an example to create an XML::Reader object with a file-name:
my $rdr = XML::Reader->new('input.xml') or die "Error: $!";
Here is another example to create an XML::Reader object with a reference:
my $rdr = XML::Reader->new(\'<data>abc</data>') or die "Error: $!";
One ,or more, of the following options can be added as a hash-reference:
- option {comment => 0}
-
The option {comment => 1} allows comments to be passed through. The option {comment => 0} disables comments. The default is {comment => 0}.
- option {strip => 1}
-
The option {strip => 1} strips all leading and trailing spaces from text and comments. (attributes are never stripped). The default is {strip => 1}.
- option {filter => 1}
-
The option {filter => 1} removes all empty text lines. Be careful if you want to use the
is_start
andis_end
methods, in which case you have to set option {filter => 0}. The default is {filter => 1}.
Methods
A successfully created object of type XML::Reader provides the following methods:
- iterate
-
Reads one single XML-value. It returns 1 after a successful read, or undef when it hits end-of-file.
- path
-
Provides the complete path of the currently selected value, attributes are represented by leading '@'-signs, comments are represented by a '#'-symbol.
- value
-
Provides the actual value (i.e. text, attribute or comment).
- type
-
Provides the type of the value: 'T' for text, '@' for attributes, '#' for comments.
- tag
-
Provides the current tag-name (or attribute-name).
- is_start
-
Returns 1 or 0, depending on whether the XML-file had a start tag at the current position. Be careful, this method only make sense if filter is switched off (otherwise constant 0 is returned).
- is_end
-
Returns 1 or 0, depending on whether the XML-file had an end tag at the current position. Be careful, this method only make sense if filter is switched off (otherwise constant 0 is returned).
AUTHOR
Klaus Eichner, March 2009
COPYRIGHT AND LICENSE
Copyright (C) 2009 by Klaus Eichner
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
RELATED MODULES
If you also want to write XML, have a look at XML::Writer. This module provides a simple interface for writing XML. (If you are writing non-mixed content XML, consider setting DATA_MODE=>1 and DATA_INDENT=>2, which allows for proper indentation in your XML-Output file)