NAME

XML::Reader - Reading XML and providing path information based on a pull-parser.

SYNOPSIS

use XML::Reader;

my $text = '<root>stu<test param="v">w</test>xyz</root>';
my $rdr = XML::Reader->new(\$text) or die "Error: $!";

while ($rdr->iterate) {
    print "Path = ", $rdr->path, ", Value = ", $rdr->value, "\n";
}

DESCRIPTION

XML::Reader provides an easy to use and simple interface for sequentially parsing XML files (so called "pull-mode" parsing) and at the same time keeps track of the complete XML-path.

It was developped as a wrapper on top of XML::Parser (while, at the same time, some basic functions have been copied from XML::TokeParser). Both XML::Parser and XML::TokeParser allow pull-mode parsing, but do not keep track of the complete XML-Path. Also, the interfaces to XML::Parser and XML::TokeParser require you to distinguish between start-tags, end-tags and text, which, in my view, complicates the interface.

There is also XML::TiePYX, which lets you pull-mode parse XML-Files (see http://www.xml.com/pub/a/2000/03/15/feature/index.html for an introduction to PYX). But still, with XML::TiePYX you need to account for start-tags, end-tags and text, and it does not provide the full XML-path.

By contrast, XML::Reader translates start-tags, end-tags and text into XPath-like expressions. So you don't need to worry about tags, you just get a path and a value, and that's it.

For example, the following XML in variable '$line'...

my $line = q{
  <data>
    <item>abc</item>
    <item>
      <dummy/>
      fgh
      <inner name="ttt" id="fff">
        ooo <!-- comment --> ppp
      </inner>
    </item>
  </data>
};

...can be parsed with XML::Reader using the methods iterate to iterate one-by-one through the XML-data, path and value to extract the XML-path and it's value.

You can also keep track of the start- and end-tags: There is a method is_start which returns 1 or 0, depending on whether the XML-file had a start tag at the current position. There is also the equivalent method is_end. Just remember, those two method only make sense if filter is switched off (otherwise those methods return constant 0). Finally, there are methods tag (which gives you the current tag-name or attribute-name), type (which is either 'T' for text, '@' for attributes or '#' for comments) and level (which indicates the current nesting-level).

Here is a sample program to demonstrate the principle...

use XML::Reader;

my $rdr = XML::Reader->new(\$line) or die "Error: $!";
my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. pat=%-22s, val=%-9s, s=%-1s, e=%-1s, tag=%-6s, t=%-1s, lvl=%2d\n",
     $i, $rdr->path, $rdr->value, $rdr->is_start,
     $rdr->is_end, $rdr->tag, $rdr->type, $rdr->level;
}

...and here is the output:

 1. pat=/data                 , val=         , s=1, e=0, tag=data  , t=T, lvl= 1
 2. pat=/data/item            , val=abc      , s=1, e=1, tag=item  , t=T, lvl= 2
 3. pat=/data                 , val=         , s=0, e=0, tag=data  , t=T, lvl= 1
 4. pat=/data/item            , val=         , s=1, e=0, tag=item  , t=T, lvl= 2
 5. pat=/data/item/dummy      , val=         , s=1, e=1, tag=dummy , t=T, lvl= 3
 6. pat=/data/item            , val=fgh      , s=0, e=0, tag=item  , t=T, lvl= 2
 7. pat=/data/item/inner      , val=ooo ppp  , s=1, e=0, tag=inner , t=T, lvl= 3
 8. pat=/data/item/inner/@id  , val=fff      , s=0, e=0, tag=id    , t=@, lvl= 4
 9. pat=/data/item/inner/@name, val=ttt      , s=0, e=0, tag=name  , t=@, lvl= 4
10. pat=/data/item/inner/#    , val=comment  , s=0, e=0, tag=      , t=#, lvl= 4
11. pat=/data/item/inner      , val=         , s=0, e=1, tag=inner , t=T, lvl= 3
12. pat=/data/item            , val=         , s=0, e=1, tag=item  , t=T, lvl= 2
13. pat=/data                 , val=         , s=0, e=1, tag=data  , t=T, lvl= 1

If you want, you can set a filter to select only those lines that have a value:

my $rdr = XML::Reader->new(\$line, {filter => 1}) or die "Error: $!";

Then the output will be as follows (be careful not to interpret $rdr->is_start or $rdr->is_end when the filter has been activated)

1. pat=/data                 , val=         , s=0, e=0, tag=data  , t=T, lvl= 1
2. pat=/data/item            , val=abc      , s=0, e=0, tag=item  , t=T, lvl= 2
3. pat=/data/item            , val=         , s=0, e=0, tag=item  , t=T, lvl= 2
4. pat=/data/item            , val=fgh      , s=0, e=0, tag=item  , t=T, lvl= 2
5. pat=/data/item/inner      , val=ooo ppp  , s=0, e=0, tag=inner , t=T, lvl= 3
6. pat=/data/item/inner/@id  , val=fff      , s=0, e=0, tag=id    , t=@, lvl= 4
7. pat=/data/item/inner/@name, val=ttt      , s=0, e=0, tag=name  , t=@, lvl= 4
8. pat=/data/item/inner/#    , val=comment  , s=0, e=0, tag=      , t=#, lvl= 4

INTERFACE

Object creation

To create an XML::Reader object, the following syntax is used:

my $rdr = XML::Reader->new($data, {comment => 0, strip => 1, filter => 1})
  or die "Error: $!";

The element $data (which is mandatory) is either the name of the XML-file, or a reference to a string, in which case the content of that string is taken as the text of the XML.

Here is an example to create an XML::Reader object with a file-name:

my $rdr = XML::Reader->new('input.xml') or die "Error: $!";

Here is another example to create an XML::Reader object with a reference:

my $rdr = XML::Reader->new(\'<data>abc</data>') or die "Error: $!";

One ,or more, of the following options can be added as a hash-reference:

option {comment => 1}

The option {comment => 1} allows comments to be passed through. The option {comment => 0} disables comments. The default is {comment => 1}.

option {strip => 1}

The option {strip => 1} strips all leading and trailing spaces from text and comments. (attributes are never stripped). The default is {strip => 1}.

option {filter => 0}

The option {filter => 1} removes all empty text lines. Be careful if you want to use the is_start and is_end methods, in which case you have to set option {filter => 0}. The default is {filter => 0}.

option {using => ['/path1/path2/path3', '/path4/path5/path6']}

This option removes all lines which do not start with '/path1/path2/path3' (or with '/path4/path5/path6', for that matter). This effectively leaves only lines starting with '/path1/path2/path3' or '/path4/path5/path6'. Those lines (which are not removed) will have a shorter path by effectively removing the prefix '/path1/path2/path3' (or '/path4/path5/path6') from the path. The removed prefix, however, shows up in the prefix-method.

'/path1/path2/path3' (or '/path4/path5/path6') are supposed to be absolute and complete, i.e. absolute meaning they have to start with a '/'-character and complete meaning that the last item in path 'path3' (or 'path6', for that matter) will be completed internally by a trailing '/'-character.

Methods

A successfully created object of type XML::Reader provides the following methods:

iterate: Reads one single XML-value. It returns 1 after a successful read, or undef when it hits end-of-file.
path: Provides the complete path of the currently selected value, attributes are represented by leading '@'-signs, comments are represented by a '#'-symbol.
value: Provides the actual value (i.e. text, attribute or comment).
type: Provides the type of the value: 'T' for text, '@' for attributes, '#' for comments.
tag: Provides the current tag-name (or attribute-name).
is_start: Returns 1 or 0, depending on whether the XML-file had a start tag at the current position. Be careful, this method only make sense if filter is switched off (otherwise constant 0 is returned).
is_end: Returns 1 or 0, depending on whether the XML-file had an end tag at the current position. Be careful, this method only make sense if filter is switched off (otherwise constant 0 is returned).
level: Indicates the nesting level of the XPath expression (numeric value greater than zero).
prefix: Shows the prefix which has been removed in option {using => ...}. Returns the empty string if option {using => ...} has not been specified.

EXAMPLES

Here is a sample piece of XML (in valiable '$line'):

my $line = q{
<data>
  <order>
    <database>
      <customer name="aaa" />
      <customer name="bbb" />
      <customer name="ccc" />
      <customer name="ddd" />
    </database>
  </order>
  <dummy value="ttt">test</dummy>
  <supplier>hhh</supplier>
  <supplier>iii</supplier>
  <supplier>jjj</supplier>
</data>
};

An example with option 'using'

The following program takes this XML and parses it with XML::Reader, including the option 'using' to target specific elements:

use XML::Reader;

my $rdr = XML::Reader->new(\$line, {filter => 0,
  using => ['/data/order/database/customer', '/data/supplier']});

my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. prf=%-29s, pat=%-7s, val=%-3s, s=%-1s, e=%-1s, tag=%-11s, t=%-1s, lvl=%2d\n",
      $i, $rdr->prefix, $rdr->path, $rdr->value, $rdr->is_start,
      $rdr->is_end, $rdr->tag, $rdr->type, $rdr->level;
}

This is the output of that program:

 1. prf=/data/order/database/customer, pat=/      , val=   , s=1, e=0, tag=customer   , t=T, lvl= 0
 2. prf=/data/order/database/customer, pat=/@name , val=aaa, s=0, e=0, tag=name       , t=@, lvl= 1
 3. prf=/data/order/database/customer, pat=/      , val=   , s=0, e=1, tag=customer   , t=T, lvl= 0
 4. prf=/data/order/database/customer, pat=/      , val=   , s=1, e=0, tag=customer   , t=T, lvl= 0
 5. prf=/data/order/database/customer, pat=/@name , val=bbb, s=0, e=0, tag=name       , t=@, lvl= 1
 6. prf=/data/order/database/customer, pat=/      , val=   , s=0, e=1, tag=customer   , t=T, lvl= 0
 7. prf=/data/order/database/customer, pat=/      , val=   , s=1, e=0, tag=customer   , t=T, lvl= 0
 8. prf=/data/order/database/customer, pat=/@name , val=ccc, s=0, e=0, tag=name       , t=@, lvl= 1
 9. prf=/data/order/database/customer, pat=/      , val=   , s=0, e=1, tag=customer   , t=T, lvl= 0
10. prf=/data/order/database/customer, pat=/      , val=   , s=1, e=0, tag=customer   , t=T, lvl= 0
11. prf=/data/order/database/customer, pat=/@name , val=ddd, s=0, e=0, tag=name       , t=@, lvl= 1
12. prf=/data/order/database/customer, pat=/      , val=   , s=0, e=1, tag=customer   , t=T, lvl= 0
13. prf=/data/supplier               , pat=/      , val=hhh, s=1, e=1, tag=supplier   , t=T, lvl= 0
14. prf=/data/supplier               , pat=/      , val=iii, s=1, e=1, tag=supplier   , t=T, lvl= 0
15. prf=/data/supplier               , pat=/      , val=jjj, s=1, e=1, tag=supplier   , t=T, lvl= 0

An example without option 'using'

The following program takes the same XML and parses it with XML::Reader, but without the option 'using'.

use XML::Reader;

my $rdr = XML::Reader->new(\$line, {filter => 0});
my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. prf=%-1s, pat=%-37s, val=%-6s, s=%-1s, e=%-1s, tag=%-11s, t=%-1s, lvl=%2d\n",
     $i, $rdr->prefix, $rdr->path, $rdr->value, $rdr->is_start,
     $rdr->is_end, $rdr->tag, $rdr->type, $rdr->level;
}

As you can see in the following output, there are many more lines written, the prefix is empty and the path is much longer than in the previous program:

 1. prf= , pat=/data                                , val=      , s=1, e=0, tag=data       , t=T, lvl= 1
 2. prf= , pat=/data/order                          , val=      , s=1, e=0, tag=order      , t=T, lvl= 2
 3. prf= , pat=/data/order/database                 , val=      , s=1, e=0, tag=database   , t=T, lvl= 3
 4. prf= , pat=/data/order/database/customer        , val=      , s=1, e=0, tag=customer   , t=T, lvl= 4
 5. prf= , pat=/data/order/database/customer/@name  , val=aaa   , s=0, e=0, tag=name       , t=@, lvl= 5
 6. prf= , pat=/data/order/database/customer        , val=      , s=0, e=1, tag=customer   , t=T, lvl= 4
 7. prf= , pat=/data/order/database                 , val=      , s=0, e=0, tag=database   , t=T, lvl= 3
 8. prf= , pat=/data/order/database/customer        , val=      , s=1, e=0, tag=customer   , t=T, lvl= 4
 9. prf= , pat=/data/order/database/customer/@name  , val=bbb   , s=0, e=0, tag=name       , t=@, lvl= 5
10. prf= , pat=/data/order/database/customer        , val=      , s=0, e=1, tag=customer   , t=T, lvl= 4
11. prf= , pat=/data/order/database                 , val=      , s=0, e=0, tag=database   , t=T, lvl= 3
12. prf= , pat=/data/order/database/customer        , val=      , s=1, e=0, tag=customer   , t=T, lvl= 4
13. prf= , pat=/data/order/database/customer/@name  , val=ccc   , s=0, e=0, tag=name       , t=@, lvl= 5
14. prf= , pat=/data/order/database/customer        , val=      , s=0, e=1, tag=customer   , t=T, lvl= 4
15. prf= , pat=/data/order/database                 , val=      , s=0, e=0, tag=database   , t=T, lvl= 3
16. prf= , pat=/data/order/database/customer        , val=      , s=1, e=0, tag=customer   , t=T, lvl= 4
17. prf= , pat=/data/order/database/customer/@name  , val=ddd   , s=0, e=0, tag=name       , t=@, lvl= 5
18. prf= , pat=/data/order/database/customer        , val=      , s=0, e=1, tag=customer   , t=T, lvl= 4
19. prf= , pat=/data/order/database                 , val=      , s=0, e=1, tag=database   , t=T, lvl= 3
20. prf= , pat=/data/order                          , val=      , s=0, e=1, tag=order      , t=T, lvl= 2
21. prf= , pat=/data                                , val=      , s=0, e=0, tag=data       , t=T, lvl= 1
22. prf= , pat=/data/dummy                          , val=test  , s=1, e=0, tag=dummy      , t=T, lvl= 2
23. prf= , pat=/data/dummy/@value                   , val=ttt   , s=0, e=0, tag=value      , t=@, lvl= 3
24. prf= , pat=/data/dummy                          , val=      , s=0, e=1, tag=dummy      , t=T, lvl= 2
25. prf= , pat=/data                                , val=      , s=0, e=0, tag=data       , t=T, lvl= 1
26. prf= , pat=/data/supplier                       , val=hhh   , s=1, e=1, tag=supplier   , t=T, lvl= 2
27. prf= , pat=/data                                , val=      , s=0, e=0, tag=data       , t=T, lvl= 1
28. prf= , pat=/data/supplier                       , val=iii   , s=1, e=1, tag=supplier   , t=T, lvl= 2
29. prf= , pat=/data                                , val=      , s=0, e=0, tag=data       , t=T, lvl= 1
30. prf= , pat=/data/supplier                       , val=jjj   , s=1, e=1, tag=supplier   , t=T, lvl= 2
31. prf= , pat=/data                                , val=      , s=0, e=1, tag=data       , t=T, lvl= 1

AUTHOR

Klaus Eichner, March 2009

COPYRIGHT AND LICENSE

If you also want to write XML, have a look at XML::Writer. This module provides a simple interface for writing XML. (If you are writing non-mixed content XML, consider setting DATA_MODE=>1 and DATA_INDENT=>2, which allows for proper indentation in your XML-Output file)

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)