NAME
XML::Trivial - The trivial tool representing parsed XML as tree of read only objects.
VERSION
Version 0.01
SYNOPSIS
use XML::Trivial ();
my $xml = XML::Trivial::parseFile('filename');
print "Names and text contents of /root/child/* elements:\n";
foreach ($$xml{0}{child}->ea) {
print "name:".$_->en;
print " text:".$_->ts."\n";
}
DESCRIPTION
This module provides easy read only and random access to previously parsed XML documents in Perl. The xml declaration, elements, attributes, comments, text nodes, CDATA sections and processing instructions are implemented. Following limitations are assumed:
* The XML files are small, respectively, parsed XML data are storable in memory.
* Perl structure representing XML file is NOT serializable by Data::Dumper. (But every element is serializable by its own sr() method.)
* Perl structure is read only.
The module is namespace-aware.
IDEAS
This module is designed for reading and traversing the small XML files in Perl. There are no expectations of xml structure before parse time, every well-formed document can be parsed and traversed, every element can be serialized, all without any lose of information.
DEPENDENCIES
XML::Parser::Expat is used for parsing of the XML files. This may change or may get optional.
USAGE
use XML::Trivial ();
Module functions
Parsing
my $xml = XML::Trivial::parseFile('filename');
If specified filename does not exist or the content is not well formed xml document, the subroutine dies with origin expat's message, because this module has no opinion about what to do in these situations.
Or:
my $xml = XML::Trivial::parse(q{<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<home>/usr/local/myApplication</home>
<sections>
<section name="A" version="1.8" escaped="',",<">
<a_specific>aaa</a_specific>
</section>
<section name="B">bbb</section>
<text>
...and there is another stuff
<![CDATA[<html><body><hr>Hello, world!<hr></body></html>]]>
...more stuff here...
<element/>
<![CDATA[2nd CDATA]]>
...]]>...
</text>
</sections>
<!--processing instructions-->
<?first do something ?>
<?second do st. else ?>
<?first fake ?>
<!--namespaces-->
<meta xmlns="meta_ns" xmlns:p1="first_ns" xmlns:p2="second_ns">
<desc a="v" p1:a="v1" p2:a="v2"/>
<p1:desc a="v" p1:a="v1" p2:a="v2"/>
<p2:desc a="v" p1:a="v1" p2:a="v2"/>
</meta>
</root>});
This xml document, represented by $xml
, is used in examples below.
XML declaration
Dirty, but rarely used :)
print "version: ".$xml->a(1)->[0]."\n";
print "encoding: ".$xml->a(1)->[1]."\n";
print "standalone: ".$xml->a(1)->[2]."\n";
If some of described parts of xml declaration is not present, undef is returned. See XML::Parser::Expat documentation (find XMLDecl) for details.
Document tree
Parsed xml is organized into tree datastructure, which nodes represents the rootnode and the elements. All nodes have the same class, XML::Trivial::Element. The simplest navigation through the tree is possible according to following examples (the sr() method of final element serializes that element, just for demonstration):
Navigation by element name:
print "homeelement: ".$$xml{root}{home}->sr."\n";
print "prefix based access: ".$$xml{root}{meta}{'p1:desc'}->sr."\n";
print "namespace based access: ".$$xml{root}{meta}{'first_ns*desc'}->sr."\n";
BE CAREFULL, if more sibbling elements would belong to the same hashkey, the first sibbling is already returned.
Navigation by element position:
print "first child element of rootelement: ".$$xml{root}{0}->sr."\n";
If the non-negative integer is used as a key, the sibling on that position is returned.
Element methods
Describing particular methods, terms 'hash(ref)' and 'array(ref)' are used when returned type depends on calling context - in scalar context, method returns hashref or arrayref, in list context, method returns list (hash or array).
- p()
-
parentnode. Returns parent element or root node.
print "serializes whole document: ".$$xml{0}->p->sr."\n";
- en()
-
element (qualified) name
print "home element name: ".$$xml{0}{0}->en."\n"; print "name of 3rd childelement of meta: ".$$xml{0}{meta}{2}->en."\n";
Returns qualified element name (including namespace prefix).
- ep()
-
element prefix
print "home element prefix: '".$$xml{0}{0}->ep."'\n"; print "prefix of 3rd childelement of meta: '".$$xml{0}{meta}{2}->ep."'\n";
Returns prefix of qualified element name.
- ln()
-
element local (unqualified) name
print "home element localname: '".$$xml{0}{0}->ln."'\n"; print "localname of 3rd childelement of meta: '".$$xml{0}{meta}{2}->ln."'\n";
Returns unqualified element name (excludes namespace prefix).
- ns()
-
namespaces. Returns hash(ref) of namespaces in the element's scope.
print "all namespaces of 'desc' element:\n"; for (my %h = $$xml{0}{meta}{desc}->ns(); my ($key, $val) = each %h; print " '$key'='$val'\n"){};
- ns(undef)
-
namespace of the element.
print "namespace of 'p2:desc' element: ".$$xml{0}{meta}{'p2:desc'}->ns(undef)."\n";
- ns($prefix)
-
namespace of specified prefix.
print "namespace of 'p2' prefix in <desc> element: ".$$xml{0}{meta}{desc}->ns('p2')."\n";
Returns namespace of specified prefix, valid in the element.
- ah()
-
attribute hash(ref). Returns the hash (in list context) or hashref (in scalar context) of all attributes - the keys of the hash are qualified attribute names.
print "all attributes of 'desc' element:\n"; for (my %h = $$xml{0}{meta}{desc}->ah(); my ($key, $val) = each %h; print " '$key'='$val'\n"){};
- ah($attrname)
-
attribute hash. Returns the value of specified attribute name.
print "\n1st section version: ".$$xml{0}{sections}{section}->ah('version')."\n"; print "p1:a value of p2:desc element: ".$$xml{0}{meta}{'p2:desc'}->ah('p1:a')."\n";
This usage of this method (with 1 argument) is namespace naive - the argument have to be qualified attribute name with the same prefix as in parsed document.
- ah($unprefixedattrname, $namespace)
-
attribute hash. If both arguments are defined, it returns the value of specified attribute unprefixed name in specified namespace.
print "attrval of 'a' in 'first_ns' in 'desc' element: ".$$xml{0}{2}{0}->ah('a','first_ns')."\n";
- ah($unprefixedattrname, undef)
-
attribute hash. If second argument is not defined but present, it returns the hash or hashref of attribute values of all namespaces, where such attribute unprefixed name actually occurs.
print "values of 'a' attrs of 'desc' element:\n"; for (my %h = $$xml{0}{meta}{desc}->ah('a',undef); my ($key, $val) = each %h; print " '$key'='$val'\n"){};
- ah(undef, $namespace)
-
attribute hash. If first argument is not defined, it returns the hash or hashref of attributes in specified namespace.
print "attributes of 'desc' element in 'second_ns':\n"; for (my %h = $$xml{0}{meta}{desc}->ah(undef,'second_ns'); my ($key, $val) = each %h; print " '$key'='$val'\n"){};
- ah(undef, undef)
-
attribute hash. If both arguments are not defined but present, it returns the hash or hashref of attributes in the element's namespace.
print "attributes of 'p1:desc' element in its namespace:\n"; for (my %h = $$xml{0}{meta}{'p1:desc'}->ah(undef,undef); my ($key, $val) = each %h; print " '$key'='$val'\n"){};
Remember, that unprefixed attribute does NOT inherit namespace from its element.
- eh()
-
element hash(ref). Returns hash or hashref (depends on calling context) of child elements. If more than one child element have the same qualified name, only the first one is present in return.
print "hash of child elements of 'sections':\n"; for (my %h = $$xml{0}{sections}->eh(); my ($key, $val) = each %h; print " '$key'='".$val->sr."'\n"){};
- eh($childname)
-
element hash. Returns the first child element with specified name.
print "first section: ".$$xml{0}{sections}->eh('section')->sr."\n";
- ea()
-
element array(ref). Returns the array or arrayref of child elements.
print "all childelements of sections:\n"; foreach ($$xml{0}{sections}->ea) { print " element name:".$_->en."\n"; }
- ea($index)
-
element array. Returns the $index'th child element.
print "second childelement of sections: ".$$xml{0}{sections}->ea(1)->sr."\n";
- ta()
-
text array(ref). Returns array(ref) of all textnodes, including CDATA sections.
print "all texts under <text>:\n"; foreach ($$xml{0}{sections}{text}->ta) { print " piece of text:".$_."\n"; }
- ta($index)
-
text array. Returns $index'th textnode under element, including CDATA sections.
print "second text under <text>: ".$$xml{0}{sections}{text}->ta(1)."\n";
- ca()
-
cdata array(ref). Returns array(ref) of CDATA sections.
print "all cdatas under <text>:\n"; foreach ($$xml{0}{sections}{text}->ca) { print " cdata: ".$_."\n"; }
- ca($index)
-
cdata array. Returns $index'th CDATA section under element.
print "first cdata section under <text>: ".$$xml{0}{sections}{text}->ca(0)."\n";
- ts()
-
text serialized. Returns all textnodes, serialized into scalar string.
print "whole serialized text under <text>:".$$xml{0}{sections}{text}->ts."\n";
- pa()
-
processing instruction array(ref). Returns array(ref) of all processing instructions if called without arguments. Items of returned array are arrayrefs of two items, target and body.
print "processing instructions under rootelement:\n"; foreach ($$xml{0}->pa) { print " target:$$_[0] body:$$_[1]\n"; }
- pa($index)
-
processing instruction array. Returns $index'th processing instruction under element. Returned processing instruction is arrayref of two items, target and body.
print "first processing instruction under rootelement: ".join(' ',@{$$xml{0}->pa(0)})."\n";
- ph()
-
processing instruction hash(ref). Returns the hash(ref) of processing instructions (the first occur of target wins) if called without arguments.
print "processing instructions with different targets under rootelement:\n"; for (my %h = $$xml{0}->ph(); my ($key, $val) = each %h; print " '$key'='".$val."'\n"){};
- ph($target)
-
processing instruction hash. Returns the first processing instruction with specified target.
print "first processing instruction having target 'first' under rootelement: ".$$xml{0}->ph('first')."\n";
- na()
-
note array(ref). Returns array(ref) of all comments if called without arguments.
print "notes under rootelement:\n"; foreach ($$xml{0}->na) { print " $_\n"; }
- na($index)
-
note array. Returns $index'th note under element.
print "second note under rootelement: ".$$xml{0}->na(1)."\n";
- a($index)
-
all. Returns internal representation of element. Helpfull if the order of mixed elements, text nodes, PI's etc. does matter. See the code, for instance body of sr() method.
- sr()
-
serialize.
print "whole document, serialized:\n"; print $xml->sr;
Returns serialized element or root node. For attribute values, it outputs apostrophes as delimiters, escaping ampersands, apostrophes and left brackets inside. For text values, it escapes ampersands, left brackets and ]]> sequence to ]]>. Due to expat behaviour, there is nothing to serialize under root node excepting root element.
SEE ALSO
XML::Parser::Expat
XML::Simple for much more sophisticated XML2perlstruct transformations.
XML::Twig for parsing and traversing huge xml documents.
XML::LibXML for more complex review of the XML possibilities in Perl.
AUTHOR
Jan Poslusny aka Pajout, <pajout at cpan.org>
BUGS
Please report any bugs or feature requests to bug-xml-trivial at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Trivial. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc XML::Trivial
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
RT: CPAN's request tracker
Search CPAN
COPYRIGHT & LICENSE
Copyright 2007 Jan Poslusny, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.