NAME

XML::Reader - Reading XML and providing path information based on a pull-parser.

SYNOPSIS

use XML::Reader;

my $text = q{<init>n <?test pi?> t<page node="400">m <!-- remark --> r</page></init>};

my $rdr = XML::Reader->newhd(\$text) or die "Error: $!";
while ($rdr->iterate) {
    printf "Path: %-19s, Value: %s\n", $rdr->path, $rdr->value;
}

This program produces the following output:

Path: /init              , Value: n t
Path: /init/page/@node   , Value: 400
Path: /init/page         , Value: m r
Path: /init              , Value:

DESCRIPTION

XML::Reader provides a simple and easy to use interface for sequentially parsing XML files (so called "pull-mode" parsing) and at the same time keeps track of the complete XML-path.

It was developped as a wrapper on top of XML::Parser (while, at the same time, some basic functions have been copied from XML::TokeParser). Both XML::Parser and XML::TokeParser allow pull-mode parsing, but do not keep track of the complete XML-Path. Also, the interfaces to XML::Parser and XML::TokeParser require you to distinguish between start-tags, end-tags and text, which, in my view, complicates the interface.

There is also XML::TiePYX, which lets you pull-mode parse XML-Files (see http://www.xml.com/pub/a/2000/03/15/feature/index.html for an introduction to PYX). But still, with XML::TiePYX you need to account for start-tags, end-tags and text, and it does not provide the full XML-path.

By contrast, XML::Reader translates start-tags, end-tags and text into XPath-like expressions. So you don't need to worry about tags, you just get a path and a value, and that's it. (However, should you wish to operate XML::Reader in a PYX compatible mode, there is option {filter => 4}, as described below, which allows you to parse XML in that way).

But going back to the normal mode of operation, here is an example XML in variable '$line1':

my $line1 = 
q{<?xml version="1.0" encoding="ISO-8859-1"?>
  <data>
    <item>abc</item>
    <item><!-- c1 -->
      <dummy/>
      fgh
      <inner name="ttt" id="fff">
        ooo <!-- c2 --> ppp
      </inner>
    </item>
  </data>
};

This example can be parsed with XML::Reader using the methods iterate to iterate one-by-one through the XML-data, path and value to extract the current XML-path and it's value.

You can also keep track of the start- and end-tags: There is a method is_start, which returns 1 or 0, depending on whether the XML-file had a start tag at the current position. There is also the equivalent method is_end.

There are also the methods tag, attr, type and level. tag gives you the current tag-name, attr returns the attribute-name, type returns either 'T' for text or '@' for attributes and level indicates the current nesting-level (a number >= 0).

Here is a sample program which parses the XML in '$line1' from above to demonstrate the principle...

use XML::Reader;

my $rdr = XML::Reader->newhd(\$line1) or die "Error: $!";
my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. pat=%-22s, val=%-9s, s=%-1s, e=%-1s, tag=%-6s, atr=%-6s, t=%-1s, lvl=%2d\n", $i,
      $rdr->path, $rdr->value, $rdr->is_start, $rdr->is_end, $rdr->tag, $rdr->attr, $rdr->type, $rdr->level;
}

...and here is the output:

 1. pat=/data                 , val=         , s=1, e=0, tag=data  , atr=      , t=T, lvl= 1
 2. pat=/data/item            , val=abc      , s=1, e=1, tag=item  , atr=      , t=T, lvl= 2
 3. pat=/data                 , val=         , s=0, e=0, tag=data  , atr=      , t=T, lvl= 1
 4. pat=/data/item            , val=         , s=1, e=0, tag=item  , atr=      , t=T, lvl= 2
 5. pat=/data/item/dummy      , val=         , s=1, e=1, tag=dummy , atr=      , t=T, lvl= 3
 6. pat=/data/item            , val=fgh      , s=0, e=0, tag=item  , atr=      , t=T, lvl= 2
 7. pat=/data/item/inner/@id  , val=fff      , s=0, e=0, tag=@id   , atr=id    , t=@, lvl= 4
 8. pat=/data/item/inner/@name, val=ttt      , s=0, e=0, tag=@name , atr=name  , t=@, lvl= 4
 9. pat=/data/item/inner      , val=ooo ppp  , s=1, e=1, tag=inner , atr=      , t=T, lvl= 3
10. pat=/data/item            , val=         , s=0, e=1, tag=item  , atr=      , t=T, lvl= 2
11. pat=/data                 , val=         , s=0, e=1, tag=data  , atr=      , t=T, lvl= 1

If you want, you can set option {filter => 1} to select only those lines that have a value.

use XML::Reader;

my $rdr = XML::Reader->newhd(\$line1, {filter => 1}) or die "Error: $!";
my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. pat=%-22s, val=%-9s, tag=%-6s, atr=%-6s, t=%-1s, lvl=%2d\n",
     $i, $rdr->path, $rdr->value, $rdr->tag, $rdr->attr, $rdr->type, $rdr->level;
}

In this case the output will be as follows (You should not interpret the methods $rdr->is_start or $rdr->is_end when option {filter => 1} is set (those methods will, in fact, be undefined).

1. pat=/data/item            , val=abc      , tag=item  , atr=      , t=T, lvl= 2
2. pat=/data/item            , val=fgh      , tag=item  , atr=      , t=T, lvl= 2
3. pat=/data/item/inner/@id  , val=fff      , tag=@id   , atr=id    , t=@, lvl= 4
4. pat=/data/item/inner/@name, val=ttt      , tag=@name , atr=name  , t=@, lvl= 4
5. pat=/data/item/inner      , val=ooo ppp  , tag=inner , atr=      , t=T, lvl= 3

INTERFACE

Object creation

To create an XML::Reader object, the following syntax is used:

my $rdr = XML::Reader->newhd($data,
  {strip => 1, filter => 2, using => ['/path1', '/path2']})
  or die "Error: $!";

The element $data (which is mandatory) is the name of the XML-file, or a reference to a string, in which case the content of that string is taken as the text of the XML.

Alternatively, $data can also be a previously opened filehandle, such as *STDIN, in which case that filehandle is used to read the XML.

Here is an example to create an XML::Reader object with a file-name:

my $rdr = XML::Reader->newhd('input.xml') or die "Error: $!";

Here is another example to create an XML::Reader object with a reference:

my $rdr = XML::Reader->newhd(\'<data>abc</data>') or die "Error: $!";

Here is an example to create an XML::Reader object with an open filehandle:

open my $fh, '<', 'input.xml' or die "Error: $!";
my $rdr = XML::Reader->newhd($fh);

Here is an example to create an XML::Reader object with \*STDIN:

my $rdr = XML::Reader->newhd(\*STDIN);

One or more of the following options can be added as a hash-reference:

option {parse_ct => }

Option {parse_ct => 1} allows for comments to be parsed, default is {parse_ct => 0}

option {parse_pi => }

Option {parse_pi => 1} allows for process-instructions and XML-Declarations to be parsed, default is {parse_pi => 0}

option {using => }

Option Using allows for selecting a sub-tree of the XML.

The syntax is {using => ['/path1/path2/path3', '/path4/path5/path6']}

option {filter => }

Option {filter => 1} activates a filter to remove lines with an empty value.

Option {filter => 2} deactivates the filter, so all lines are shown, even lines with an empty value.

Option {filter => 3} also deactivates the filter, but removes attribute lines (i.e. it removes lines where $rdr->type = '@'). Instead, it returns the attributes in a hash $rdr->att_hash.

Option {filter => 4} also deactivates the filter, but breaks down each line into its individual start-tags, end-tags, attributes, comments and processing-instructions. This allows the parsing of XML into pyx-formatted lines.

The syntax is {filter => 1|2|3|4}, default is {filter => 2}

option {strip => }

Option {strip => 1} strips all leading and trailing spaces from text and comments. (attributes are never stripped). {strip => 0} leaves text and comments unmodified.

The syntax is {strip => 0|1}, default is {strip => 1}

Methods

A successfully created object of type XML::Reader provides the following methods:

iterate

Reads one single XML-value. It returns 1 after a successful read, or undef when it hits end-of-file.

path

Provides the complete path of the currently selected value, attributes are represented by leading '@'-signs.

value

Provides the actual value (i.e. the value of the current text, attribute or comment).

comment

Provides the comments of the XML. This method only make sense for option {filter => 2} (otherwise, in case of {filter => 1}, the method comment returns undef).

type

Provides the type of the value: 'T' for text, '@' for attributes.

If option {filter => 4} is in effect, then the type can be: 'T' for text, '@' for attributes, 'S' for start tags, 'E' for end-tags, '#' for comments, 'D' for the XML Declaration, '?' for processing-instructions.

tag

Provides the current tag-name.

attr

Provides the current attribute (returns the empty string for non-attribute lines).

level

Indicates the nesting level of the XPath expression (numeric value greater than zero).

prefix

Shows the prefix which has been removed in option {using => ...}. Returns the empty string if option {using => ...} has not been specified.

att_hash

Returns a reference to a hash with the current attributes of a start-tag (or empty hash if it is not a start-tag). This method does not make sense for option {filter => 1}, in which case undef is returned.

dec_hash

Returns a reference to a hash with the current attributes of an XML-Declaration (or empty hash if it is not an XML-Declaration). This method does not make sense for option {filter => 1}, in which case undef is returned.

proc_tgt

Returns the target (i.e. the first part) of a processing-instruction (or an empty string if the current event is not a processing-instruction). This method does not make sense for option {filter => 1}, in which case undef is returned.

proc_data

Returns the data (i.e. the second part) of a processing-instruction (or an empty string if the current event is not a processing-instruction). This method does not make sense for option {filter => 1}, in which case undef is returned.

pyx

Returns the pyx representation of the current XML-event.

The pyx representation is a string that starts with a specific first character. That first character of each line of PYX tells you what type of event you are dealing with: if the first character is '(', then you are dealing with a start event. If it's a ')', then you are dealing with and end event. If it's an 'A' then you are dealing with attributes. If it's '-', then you are dealing with text. If it's '?', then you are dealing with processing-instructions. (see http://www.xml.com/pub/a/2000/03/15/feature/index.html for an introduction to PYX).

The method pyx makes sense only if option {filter => 4} is selected, for any filter other than 4, undef is returned.

is_start

Returns 1 or 0, depending on whether the XML-file had a start tag at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

is_end

Returns 1 or 0, depending on whether the XML-file had an end tag at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

is_decl

Returns 1 or 0, depending on whether the XML-file had an XML-Declaration at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

is_proc

Returns 1 or 0, depending on whether the XML-file had a processing-instruction at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

is_comment

Returns 1 or 0, depending on whether the XML-file had a Comment at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

is_text

Returns 1 or 0, depending on whether the XML-file has text data at the current position. The method is_text makes sense only if option {filter => 4} is selected, for any filter other than 4, undef is returned.

is_attr

Returns 1 or 0, depending on whether the XML-file has an attribute at the current position. This method does not make sense for option {filter => 1}, in which case undef is returned.

OPTION USING

Option Using allows for selecting a sub-tree of the XML.

Here is how it works in detail...

option {using => ['/path1/path2/path3', '/path4/path5/path6']} removes all lines which do not start with '/path1/path2/path3' (or with '/path4/path5/path6', for that matter). This effectively leaves only lines starting with '/path1/path2/path3' or '/path4/path5/path6'.

Those lines (which are not removed) will have a shorter path by effectively removing the prefix '/path1/path2/path3' (or '/path4/path5/path6') from the path. The removed prefix, however, shows up in the prefix-method.

'/path1/path2/path3' (or '/path4/path5/path6') are supposed to be absolute and complete, i.e. absolute meaning they have to start with a '/'-character and complete meaning that the last item in path 'path3' (or 'path6', for that matter) will be completed internally by a trailing '/'-character.

An example with option 'using'

The following program takes this XML and parses it with XML::Reader, including the option 'using' to target specific elements:

use XML::Reader;

my $line2 = q{
<data>
  <order>
    <database>
      <customer name="aaa" />
      <customer name="bbb" />
      <customer name="ccc" />
      <customer name="ddd" />
    </database>
  </order>
  <dummy value="ttt">test</dummy>
  <supplier>hhh</supplier>
  <supplier>iii</supplier>
  <supplier>jjj</supplier>
</data>
};

my $rdr = XML::Reader->newhd(\$line2,
  {using => ['/data/order/database/customer', '/data/supplier']});

my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. prf=%-29s, pat=%-7s, val=%-3s, tag=%-6s, t=%-1s, lvl=%2d\n",
      $i, $rdr->prefix, $rdr->path, $rdr->value, $rdr->tag, $rdr->type, $rdr->level;
}

This is the output of that program:

 1. prf=/data/order/database/customer, pat=/@name , val=aaa, tag=@name , t=@, lvl= 1
 2. prf=/data/order/database/customer, pat=/      , val=   , tag=      , t=T, lvl= 0
 3. prf=/data/order/database/customer, pat=/@name , val=bbb, tag=@name , t=@, lvl= 1
 4. prf=/data/order/database/customer, pat=/      , val=   , tag=      , t=T, lvl= 0
 5. prf=/data/order/database/customer, pat=/@name , val=ccc, tag=@name , t=@, lvl= 1
 6. prf=/data/order/database/customer, pat=/      , val=   , tag=      , t=T, lvl= 0
 7. prf=/data/order/database/customer, pat=/@name , val=ddd, tag=@name , t=@, lvl= 1
 8. prf=/data/order/database/customer, pat=/      , val=   , tag=      , t=T, lvl= 0
 9. prf=/data/supplier               , pat=/      , val=hhh, tag=      , t=T, lvl= 0
10. prf=/data/supplier               , pat=/      , val=iii, tag=      , t=T, lvl= 0
11. prf=/data/supplier               , pat=/      , val=jjj, tag=      , t=T, lvl= 0

An example without option 'using'

The following program takes the same XML and parses it with XML::Reader, but without the option 'using'.

use XML::Reader;

my $rdr = XML::Reader->newhd(\$line2);
my $i = 0;
while ($rdr->iterate) { $i++;
    printf "%3d. prf=%-1s, pat=%-37s, val=%-6s, tag=%-11s, t=%-1s, lvl=%2d\n",
     $i, $rdr->prefix, $rdr->path, $rdr->value, $rdr->tag, $rdr->type, $rdr->level;
}

As you can see in the following output, there are many more lines written, the prefix is empty and the path is much longer than in the previous program:

 1. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1
 2. prf= , pat=/data/order                          , val=      , tag=order      , t=T, lvl= 2
 3. prf= , pat=/data/order/database                 , val=      , tag=database   , t=T, lvl= 3
 4. prf= , pat=/data/order/database/customer/@name  , val=aaa   , tag=@name      , t=@, lvl= 5
 5. prf= , pat=/data/order/database/customer        , val=      , tag=customer   , t=T, lvl= 4
 6. prf= , pat=/data/order/database                 , val=      , tag=database   , t=T, lvl= 3
 7. prf= , pat=/data/order/database/customer/@name  , val=bbb   , tag=@name      , t=@, lvl= 5
 8. prf= , pat=/data/order/database/customer        , val=      , tag=customer   , t=T, lvl= 4
 9. prf= , pat=/data/order/database                 , val=      , tag=database   , t=T, lvl= 3
10. prf= , pat=/data/order/database/customer/@name  , val=ccc   , tag=@name      , t=@, lvl= 5
11. prf= , pat=/data/order/database/customer        , val=      , tag=customer   , t=T, lvl= 4
12. prf= , pat=/data/order/database                 , val=      , tag=database   , t=T, lvl= 3
13. prf= , pat=/data/order/database/customer/@name  , val=ddd   , tag=@name      , t=@, lvl= 5
14. prf= , pat=/data/order/database/customer        , val=      , tag=customer   , t=T, lvl= 4
15. prf= , pat=/data/order/database                 , val=      , tag=database   , t=T, lvl= 3
16. prf= , pat=/data/order                          , val=      , tag=order      , t=T, lvl= 2
17. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1
18. prf= , pat=/data/dummy/@value                   , val=ttt   , tag=@value     , t=@, lvl= 3
19. prf= , pat=/data/dummy                          , val=test  , tag=dummy      , t=T, lvl= 2
20. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1
21. prf= , pat=/data/supplier                       , val=hhh   , tag=supplier   , t=T, lvl= 2
22. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1
23. prf= , pat=/data/supplier                       , val=iii   , tag=supplier   , t=T, lvl= 2
24. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1
25. prf= , pat=/data/supplier                       , val=jjj   , tag=supplier   , t=T, lvl= 2
26. prf= , pat=/data                                , val=      , tag=data       , t=T, lvl= 1

OPTION PARSE_CT

Option {parse_ct => 1} allows for comments to be parsed (usually, comments are ignored by XML::Reader, that is {parse_ct => 0} is the default.

Here is an example where comments are ignored by default:

use XML::Reader;

my $text = q{<?xml version="1.0"?><dummy>xyz <!-- remark --> stu <?ab cde?> test</dummy>};

my $rdr = XML::Reader->newhd(\$text) or die "Error: $!";

while ($rdr->iterate) {
    if ($rdr->is_decl)    { my %h = %{$rdr->dec_hash};
                            print "Found decl     ",  join('', map{" $_='$h{$_}'"} sort keys %h), "\n"; }
    if ($rdr->is_proc)    { print "Found proc      ", "t=", $rdr->proc_tgt, ", d=", $rdr->proc_data, "\n"; }
    if ($rdr->is_comment) { print "Found comment   ", $rdr->comment, "\n"; }
    print "Text '", $rdr->value, "'\n";
}

Here is the output:

Text 'xyz stu test'

Now, the very same XML data, and the same algorithm, except for the option {parse_ct => 1}, which is now activated:

use XML::Reader;

my $text = q{<?xml version="1.0"?><dummy>xyz <!-- remark --> stu <?ab cde?> test</dummy>};

my $rdr = XML::Reader->newhd(\$text, {parse_ct => 1}) or die "Error: $!";

while ($rdr->iterate) {
    if ($rdr->is_decl)    { my %h = %{$rdr->dec_hash};
                            print "Found decl     ",  join('', map{" $_='$h{$_}'"} sort keys %h), "\n"; }
    if ($rdr->is_proc)    { print "Found proc      ", "t=", $rdr->proc_tgt, ", d=", $rdr->proc_data, "\n"; }
    if ($rdr->is_comment) { print "Found comment   ", $rdr->comment, "\n"; }
    print "Text '", $rdr->value, "'\n";
}

Here is the output:

Text 'xyz'
Found comment   remark
Text 'stu test'

OPTION PARSE_PI

Option {parse_pi => 1} allows for processing-instructions and XML-Declarations to be parsed (usually, processing-instructions and XML-Declarations are ignored by XML::Reader, that is {parse_pi => 0} is the default.

As an example, we use the very same XML data, and the same algorithm from the above paragraph, except for the option {parse_pi => 1}, which is now activated (together with option {parse_ct => 1}):

use XML::Reader;

my $text = q{<?xml version="1.0"?><dummy>xyz <!-- remark --> stu <?ab cde?> test</dummy>};

my $rdr = XML::Reader->newhd(\$text, {parse_ct => 1, parse_pi => 1}) or die "Error: $!";

while ($rdr->iterate) {
    if ($rdr->is_decl)    { my %h = %{$rdr->dec_hash};
                            print "Found decl     ",  join('', map{" $_='$h{$_}'"} sort keys %h), "\n"; }
    if ($rdr->is_proc)    { print "Found proc      ", "t=", $rdr->proc_tgt, ", d=", $rdr->proc_data, "\n"; }
    if ($rdr->is_comment) { print "Found comment   ", $rdr->comment, "\n"; }
    print "Text '", $rdr->value, "'\n";
}

Here is the output:

Found decl      version='1.0'
Text ''
Text 'xyz'
Found comment   remark
Text 'stu'
Found proc      t=ab, d=cde
Text 'test'

OPTION FILTER

Option Filter allows to select different operation modes when processing the XML data.

Option {filter => 2}

With option {filter => 2}, XML::Reader produces one line for each character event. A preceding start-tag results in method is_start to be set to 1, a trailing end-tag results in method is_end to be set to 1.

Also, attribute lines are added via the special '/@...' syntax.

Option {filter => 2} is the default.

Here is an example...

use XML::Reader;

my $text = q{<root><test param="v"><a><b>e<data id="z">g</data>f</b></a></test>x <!-- remark --> yz</root>};

my $rdr = XML::Reader->newhd(\$text) or die "Error: $!";
while ($rdr->iterate) {
    printf "Path: %-24s, Value: %s\n", $rdr->path, $rdr->value;
}

This program (with implicit option {filter => 2} as default) produces the following output:

Path: /root                   , Value:
Path: /root/test/@param       , Value: v
Path: /root/test              , Value:
Path: /root/test/a            , Value:
Path: /root/test/a/b          , Value: e
Path: /root/test/a/b/data/@id , Value: z
Path: /root/test/a/b/data     , Value: g
Path: /root/test/a/b          , Value: f
Path: /root/test/a            , Value:
Path: /root/test              , Value:
Path: /root                   , Value: x yz

The same {filter => 2} also allows to rebuild the structure of the XML with the help of the methods is_start and is_end. Please note that the first line ("Path: /root, Value:") is empty, but important for the structure of the XML. Therefore we can't ignore it.

Let us now look at the same example (with option {filter => 2}), but with an additional algorithm to reconstruct the original XML:

use XML::Reader;

my $text = q{<root><test param="v"><a><b>e<data id="z">g</data>f</b></a></test>x <!-- remark --> yz</root>};

my $rdr = XML::Reader->newhd(\$text) or die "Error: $!";

my %at;

while ($rdr->iterate) {
    my $indentation = '  ' x ($rdr->level - 1);

    if ($rdr->type eq '@')  { $at{$rdr->attr} = $rdr->value; }

    if ($rdr->is_start) {
        print $indentation, '<', $rdr->tag, join('', map{" $_='$at{$_}'"} sort keys %at), '>', "\n";
    }

    unless ($rdr->type eq '@') { %at = (); }

    if ($rdr->type eq 'T' and $rdr->value ne '') {
        print $indentation, '  ', $rdr->value, "\n";
    }

    if ($rdr->is_end) {
        print $indentation, '</', $rdr->tag, '>', "\n";
    }
}

...and here is the output:

<root>
  <test param='v'>
    <a>
      <b>
        e
        <data id='z'>
          g
        </data>
        f
      </b>
    </a>
  </test>
  x yz
</root>

...this is proof that the original structure of the XML is not lost.

Option {filter => 3}

Option {filter = 3} works very much like {filter => 2}.

The difference, though, is that with option {filter => 3} all attribute-lines are filtered out and instead, the attributes are presented for each start-line in the hash $rdr->att_hash().

This allows, in fact, to dispense with the global %at variable of the previous algorithm, and use a local %at variable instead:

my %at = %{$rdr->att_hash};

Here is the new algorithm for {filter => 3}, we don't need to worry about attributes (that is, we don't need to check fot $rdr->type eq '@') and, as already mentioned, the %at variable is now local:

use XML::Reader;

my $text = q{<root><test param="v"><a><b>e<data id="z">g</data>f</b></a></test>x <!-- remark --> yz</root>};

my $rdr = XML::Reader->newhd(\$text, {filter => 3}) or die "Error: $!";

while ($rdr->iterate) {
    my $indentation = '  ' x ($rdr->level - 1);

    if ($rdr->is_start) {
        my %at = %{$rdr->att_hash};
        print $indentation, '<', $rdr->tag, join('', map{" $_='$at{$_}'"} sort keys %at), '>', "\n";
    }

    if ($rdr->type eq 'T' and $rdr->value ne '') {
        print $indentation, '  ', $rdr->value, "\n";
    }

    if ($rdr->is_end) {
        print $indentation, '</', $rdr->tag, '>', "\n";
    }
}

...the output for {filter => 3} is identical to the output for {filter => 2}:

<root>
  <test param='v'>
    <a>
      <b>
        e
        <data id='z'>
          g
        </data>
        f
      </b>
    </a>
  </test>
  x yz
</root>

Option {filter => 4}

Although this is not the main purpose of XML::Reader, option {filter => 4} can generate individual lines for start-tags, end-tags, comments, processing-instructions and XML-Declarations. Its aim is to generate a pyx string for further processing and analysis.

Here is an example:

use XML::Reader;

my $text = q{<?xml version="1.0" encoding="ISO-8859-1"?>
  <delta>
    <dim alter="511">
      <gamma />
      <beta>
        car <?tt dat?>
      </beta>
    </dim>
    dskjfh <!-- remark --> uuu
  </delta>};

my $rdr = XML::Reader->newhd(\$text, {filter => 4, parse_pi => 1}) or die "Error: $!";

while ($rdr->iterate) {
    printf "Type = %1s, pyx = %s\n", $rdr->type, $rdr->pyx;
}

And here is the output:

Type = D, pyx = ?xml version='1.0' encoding='ISO-8859-1'
Type = S, pyx = (delta
Type = S, pyx = (dim
Type = @, pyx = Aalter 511
Type = S, pyx = (gamma
Type = E, pyx = )gamma
Type = S, pyx = (beta
Type = T, pyx = -car
Type = ?, pyx = ?tt dat
Type = E, pyx = )beta
Type = E, pyx = )dim
Type = T, pyx = -dskjfh uuu
Type = E, pyx = )delta

Be aware that comments can be produced by pyx in a non-standard way if requested by {parse_ct => 1}. In fact, comments are produced with a leading hash symbol which is not part of the pyx specification, as can be seen by the following example:

use XML::Reader;

my $text = q{
  <delta>
    <!-- remark -->
  </delta>};

my $rdr = XML::Reader->newhd(\$text, {filter => 4, parse_ct => 1}) or die "Error: $!";

while ($rdr->iterate) {
    printf "Type = %1s, pyx = %s\n", $rdr->type, $rdr->pyx;
}

Here is the output:

Type = S, pyx = (delta
Type = #, pyx = #remark
Type = E, pyx = )delta

Finally, when operating with {filter => 4}, the usual methods (is_start, is_end, is_decl, is_proc, is_comment, is_attr, is_text, comment, proc_tgt, proc_data, dec_hash or att_hash) remain operational. Here is an example:

use XML::Reader;

my $text = q{<?xml version="1.0"?>
  <parent abc="def"> <?pt hmf?>
    dskjfh <!-- remark -->
  </parent>};

my $rdr = XML::Reader->newhd(\$text, {filter => 4, parse_pi => 1, parse_ct => 1}) or die "Error: $!";

while ($rdr->iterate) {
    if    ($rdr->is_start)   { print "Found start tag ", $rdr->tag, "\n"; }
    elsif ($rdr->is_end)     { print "Found end tag   ", $rdr->tag, "\n"; }
    elsif ($rdr->is_decl)    { my %h = %{$rdr->dec_hash};
                               print "Found decl     ",  join('', map{" $_='$h{$_}'"} sort keys %h), "\n"; }
    elsif ($rdr->is_proc)    { print "Found proc      ", "t=",    $rdr->proc_tgt, ", d=", $rdr->proc_data, "\n"; }
    elsif ($rdr->is_comment) { print "Found comment   ", $rdr->comment, "\n"; }
    elsif ($rdr->is_attr)    { print "Found attribute ", $rdr->attr, "='", $rdr->value, "'\n"; }
    elsif ($rdr->is_text)    { print "Found text      ", $rdr->value, "\n"; }
}

Here is the output:

Found decl      version='1.0'
Found start tag parent
Found attribute abc='def'
Found proc      t=pt, d=hmf
Found text      dskjfh
Found comment   remark
Found end tag   parent

Option {filter => 1}

Option {filter => 1} is similar to {filter => 2}, but reduces the number of output lines (i.e. it removes all lines that don't have a value).

In case you want to use one of the following methods is_start, is_end, is_decl, is_proc, is_comment, is_attr, is_text, comment, proc_tgt, proc_data, dec_hash or att_hash with {filter => 1}: any of the afore mentioned methods will return undef.

With option {filter => 1} we lose the ability to reconstruct the XML, but simple data processing is easier.

Here is a sample program:

use XML::Reader;

my $text = q{<root><test param="v"><a><b>e<data id="z">g</data>f</b></a></test></root>};

my $rdr = XML::Reader->newhd(\$text, {filter => 1}) or die "Error: $!";
while ($rdr->iterate) {
    printf "Path: %-24s, Value: %s\n", $rdr->path, $rdr->value;
}

...and here is the output:

Path: /root/test/@param       , Value: v
Path: /root/test/a/b          , Value: e
Path: /root/test/a/b/data/@id , Value: z
Path: /root/test/a/b/data     , Value: g
Path: /root/test/a/b          , Value: f

AUTHOR

Klaus Eichner, March 2009

COPYRIGHT AND LICENSE

If you also want to write XML, have a look at XML::Writer. This module provides a simple interface for writing XML. (If you are writing non-mixed content XML, consider setting DATA_MODE=>1 and DATA_INDENT=>2, which allows for proper indentation in your XML-Output file)

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

INTERFACE

Object creation

Methods

OPTION USING

An example with option 'using'

An example without option 'using'

OPTION PARSE_CT

OPTION PARSE_PI

OPTION FILTER

Option {filter => 2}

Option {filter => 3}

Option {filter => 4}

Option {filter => 1}

AUTHOR

COPYRIGHT AND LICENSE

RELATED MODULES

SEE ALSO

Module Install Instructions