NAME
XML::Snap - Makes simple XML tasks a snap!
VERSION
Version 0.04
SYNOPSIS
XML::Snap is a quick and relatively modern way to work with XML. If, like me, you have little patience for the endless reams of standards the XML community burdens you with, maybe this is the module for you. If you want to maintain compatibility with normal people, though, and you want to avoid scaling problems later, you're probably better off sitting down and understanding XML::LibXML and the SAX ecosystem.
The other large omission from the model at present is namespaces. If you use namespaces (and honestly, most applications do) then again, you should be using libxml or one of the SAX parsers.
Still here? Cool. XML::Snap is my personal way of dealing with XML when I can't avoid it. It's roughly based on my experiences with my ANSI C library "xmlapi", which I wrote back in 2000 to wrap the Expat parser. Along the way, I ended up building a lot of handy functionality into the library that made C programming palatable - and a lot of that was string and list manipulation that Perl renders superfluous. So after working with a port for a while, I tossed it. This is what I ended up with.
XML::Snap works in DOM mode. That is, it reads in XML from a string or file and puts it into a tree for you to manipulate, then allows you to write it back out. The tree is pretty minimalistic. The children of a node can be either plain text (as strings) or elements (as XML::Snap objects or a subclass), and each element can have a hash of attributes. Order of attributes is maintained, as this is actually significant in XML. There is also a clear distinction between content and tags. So some of the drawbacks to XML::Simple are averted with this setup.
Right at the moment, comments in the XML are not preserved. If you need to work with XML comments, XML::Snap is not your module.
Right at the moment, a streaming mode (like SAX) is also not provided, but it's something I want to get to soon. In streaming mode, comments will be preserved, but not available to the user until further notice. But since streaming has not yet been implemented, that's kind of moot. Streaming will be implemented in a separate module, probably to be named XML::Skim.
Some examples!
use XML::Snap;
XML::Snap->load ('myfile.xml');
my $query = XML::Snap->search ('mynode');
while (my $hit = <$query>) {
... do things with $hit ...
}
CREATING AND LOADING XML ELEMENTS
new (name, [attribute, value, ...])
The new
function just creates a new, empty XML node, simple as that. It has a name and optional attributes with values. Note that the order of attributes will be retained. Duplicates are not permitted (storage is in a hash); this departs from the XML model so it might cause you troubles - but I know I've never personally encountered XML where it would make a difference.
parse (string), parse_with_refs (string)
The parse
function uses the Expat parser wrapped in XML::Parse to parse the string supplied, building a tree from it. If you want text to be blessed scalar refs instead of just strings, use parse_with_refs
. (This can be easier, depending on what you're going to do with the data structure later.)
load (filename)
The load
function does the same as parse
but takes a filename instead.
name, is
The name
method returns the name of the node, that is, the tag used to create it, while the is
method tests for equality to a given string (it's just a convenience function).
oob(key, value), unoob(key)
Sets/gets an out-of-band (OOB) value on a node. This isn't anything special, just a hash attached to each node, but it can be used by a template output for parameterization, and it doesn't affect the output or actions of the XML in any other way.
If a value isn't set in a given node, it will ask its parent.
Call unoob($key)
to remove an OOB value, or unoob()
to remove all OOB values on a node.
parent, ancestor, root
parent
returns the node's parent, if it has been attached to a parent, while ancestor
finds the ancestor with the tag you supply, or the root if you don't give a tag. root
is provided as a shorthand for ancestor().
delete
Deletes a child from a node. Pass the actual reference to the child - or if you're using non-referenced text, the text itself. (In this case, duplicate text will all be deleted.)
detach
Detaches the node from its parent, if it is attached. This not only removes the parent reference, but also removes the child from its parent's list of children.
WORKING WITH ATTRIBUTES
Each tag in XML can have zero or more attributes, each of which has a value. Order is significant and preserved.
set, unset
The set
method sets one or more attributes; its parameter list is considered to be key, value, key, value, etc. The unset
method removes one or more attributes from the list.
get (attribute, default), attr_eq (attribute, value)
Obviously, get
retrieves an attribute value - specify a default value to be used if the attribute is not found, otherwise returns undef.
Since it's inconvenient to test attributes that can be undefined, there's a attr_eq
method that checks that the given attribute is defined and equal to the value given.
attrs (attribute list)
The attrs
method retrieves a list of the attributes set.
getlist (attribute list)
The getlist
method retrieves a list of attribute values given a list of attributes. (It's just a map.)
getctx (attribute, default)
The getctx
method looks at an attribute in the given node, but if it's not found, looks in the parent instead. If there is no parent, the default value is returned.
attr_order (attribute list)
Moves the named attributes to the front of the list; if any appear that aren't set, they stay unset.
WORKING WITH PLAIN TEXT CONTENT
Depending on your needs, XML::Snap can store plain text embedded in an XML structure as simple strings, or as scalar references blessed to XML::Snap. Since text may therefore not be blessed, you need to handle it with care unless you're sure it's all references (by parsing with parse_with_refs
, for instance).
istext
Returns a flag whether a given thing is text or not. "Text" means a scalar or a scalar reference; anything else will not be considered text.
This is a class method or an instance method - note that if you're using it as an instance method and you try to call it on a string, your call will die.
gettext
Returns the actual text of either a string (which is obviously just the string) or a scalar reference. Again, can be called as an instance method if you're sure it's an instance.
bless_text
Iterates through the node given, and converts all plain texts into referenced texts.
unbless_text
Iterates through the node given, and converts all referenced texts into plain texts.
WORKING WITH XML STRUCTURE
add, add_pretty
The add
method adds nodes and text as children to the current node. The add_pretty
method is a convenience method that ensures that there is a line break if a node is inserted directly at the beginning of its parent (this makes building human-readable XML easier).
In addition to nodes and text, you can also add a coderef. This will have no effect on normal operations except for appearing in the list of children for the node, but during writing operations (either for string output or to streams) the coderef will be called to retrieve an iterator that delivers XML snippets. Those snippets will be inserted into the output as though they appeared at the point in the structure where the coderef appears. Extraction from the iterator stops when it returns undef.
The next time the writer is used, the original coderef will be called again to retrieve a new iterator.
The writer functions (string, stringcontent, write, etc.) can be called with optional parameters that will be passed to each coderef in the structure, if any. This allows an XML::Snap structure to be used as a generic template, for example for writing XML structures extracted from database queries.
When adding a node that is already a child of another node, the source node will be copied into the target, not just added. (Otherwise confusion could ensue!)
Text is normally added as a simple string, but this can cause problems for consumers, as the output of an iterator might then return a mixture of unblessed strings and blessed nodes, so you end up having to test for blessedness when processing them. For ease of use, you can also add a reference to a string; it will work the same in terms of neighboring strings being coalesced, but they'll be stored as blessed string references. Then, use istext or is_node to determine what each element is when iterating through structure.
prepend, prepend_pretty
These do the same as add
and add_pretty
except at the beginning of the child list.
replacecontent, replacecontent_from
The replacecontent
method first deletes the node's children, then calls add
to add its parameters. Use replacecontent_from
to use the children of the first parameter, with optional matches to effect filtration as the rest of the parameters.
These are holdovers from my old xmlapi C library, where I was using in-memory XML structures as "bags of data". Since Perl is basically built on bags of data to start with, I'm not sure these will ever get used in a real situation (certainly I've never needed them yet in Perl).
replace
The replace
method is a little odd; it actually acts on the given node's parent, by replacing the callee with the passed parameters. In other words, the parent's children list is modified directly. If there's nothing provided as a replacement, this simply deletes the callee from its parent's child list.
children, elements
The children
method just returns the list of children added with add
(or the other addition-type methods). The elements
method returns only those children that are elements, omitting text, comments, and generators.
COPYING AND TRANSFORMATION
copy, copy_from, filter
The copy
method copies out a new node (recursively) that is independent, i.e. has no parent. If you give it some matches of the form [name, key, value, coderef], then the coderef will be called on the copy before it gets added, if the copy matches the match. If a match is just a coderef, it'll apply to all text instead.
filter
is just an alias that's a little more self-documenting.
Note that the transformations specified will not fire for the root node you're copying, just its children.
STRING/FILE OUTPUT
The obvious thing to do with an XML structure once constructed is of course to write it to a file or extract a string from it. XML::Snap gives you one powerful option, which is the use of embedded generators to act as a live template.
string, rawstring
Extracts a string from the XML node passed in; string
gives you an escaped string that can be parsed back into an equivalent XML structure, while rawstring
does not escape anything, so you can't count on equivalence or even legal XML. This is useful if your XML structure is being used to build strings, otherwise it's the wrong tool to use.
content, rawcontent
These do the same, but don't include the parent tag or its closing tag in the string.
write
Given a filename, an optional prefix to write to the file, writes the XML to a file.
writestream
Writes the XML to an open stream.
escape/unescape
These are convenience functions that escape a string for use in XML, or unescape the escaped string for non-XML use.
BOOKMARKING AND SEARCHING
Finally, there are searching and bookmarking functions for finding and locating given XML in a tree.
getloc
Retrieves a location for a given node in its tree, effectively a bookmark. The rules are simple. The bookmark consists of a set of dotted pairs, each being the name of the tag plus a disambiguator if necessary. If the tag is the first of its sibs with its own tag, no disambiguator is necessary. If the tag has an attribute named 'id' that doesn't have a dot or square brackets in it, then square brackets surrounding that value are used as the disambiguator. Otherwise, a number in parentheses identifies the sequence of the tag within the list of siblings with its own tag name.
So mytag[one]
matches mytag id="one"
and mytag(1)
matches the second 'mytag' in its parent's list of elements. mytag[one].next(3)
matches the fourth 'next' in mytag id="one"
.
This is essentially a much simplified XMLpath (I may be wrong, but I think I came up with it before XMLpaths had been defined). It's quick and dirty, but works.
loc
Given such a bookmark and the tree it pertains to, finds the bookmarked node.
all
Returns a list of XML snippets that meet the search criteria.
WALKING THE TREE
XML is a tree structure, and what do we do with trees? We walk them!
A walker is an iterator that visits each node in turn, then its children, one by one. Walkers come in two flavors: full walk or element walk; the element walk ignores text.
The walker constructor optionally takes a closure that will be called on each node before it's returned; the return from that closure will be what's returned. If it returns undef, the walk will skip that node and go on with the walk in the same order that it otherwise would have; if it returns a list of (value, 'prune')
then the walk will not visit that node's children, and "value" will be taken as the return value (and it can obviously be undef as well).
walk
walk
is the complete walk. It returns an iterator. Pass it a closure to be called on each node as it's visited. Modifying the tree's structure is entirely fine as long as you're just manipulating the children of the current node; if you do other things, the walker might get confused.
walk_elem
For the sake of convenience, walk_elem
does the same thing, except it only visits nodes, not text.
walk_all
A simplified walk that simply returns matching nodes.
my $w = $self->{body}->walk(sub {
my $node = shift;
return ($node, 'prune') if $node->is('trans-unit'); # Segments are returned whole.
return undef; # We don't want the details for anything else, but still walk into its children if it has any.
});
first
Returns the first XML element (i.e. non-node thing) that meets the search criteria.
AUTHOR
Michael Roberts, <michael at vivtek.com>
BUGS
Please report any bugs or feature requests to bug-xml-snap at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Snap. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc XML::Snap
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2013 Michael Roberts.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:
http://www.perlfoundation.org/artistic_license_2_0
Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.
If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.
This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.
Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.