NAME
CAM::XML - Encapsulation of a simple XML data structure
LICENSE
Copyright 2005 Clotho Advanced Media, Inc., <cpan@clotho.com>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SYNOPSIS
my $pollTag = CAM::XML->new('poll');
foreach my $q (@questions) {
my $questionTag = CAM::XML->new('question');
$questionTag->add(-text => $q->{text});
my $choicesTag = CAM::XML->new('choices');
foreach my $c (@{$q->{choices}}) {
my $choiceTag = CAM::XML->new('choice');
$choiceTag->setAttribute('value', $c->{value});
$choiceTag->add(-text => $c->{text});
$choicesTag->add($choiceTag);
}
$questionTag->add($choicesTag);
$pollTag->add($questionTag);
}
print CAM::XML->header();
print $pollTag->toString();
DESCRIPTION
This module reads and writes XML into a simple object model. It is optimized for ease of creating code that interacts with XML.
This module is not as powerful or as standards-compliant as say XML::LibXML, XML::SAX, XML::DOM, etc, but it's darn easy to use. I recommend it to people who want to just read/write a quick but valid XML file and don't want to bother with the bigger modules.
In our experience, this module is actually easier to use than XML::Simple because the latter makes some assumptions about XML structure that prevents it from handling all XML files well. YMMV.
However, one exception to the simplicity claimed above is our implementation of a subset of XPath. That's not very simple. Sorry.
CLASS METHODS
- parse XMLSTRING
- parse -string => XMLSTRING
- parse -filename => XMLFILENAME
- parse -filehandle => XMLFILEHANDLE
-
Parse an incoming stream of XML into a CAM::XML heirarchy. This method just hands the first argument off to XML::Parser, so it can accept any style of arg that XML::Parser can. Note that XML::Parser says the filehandle style should pass an IO::Handle object. This can be called as a class method or an instance method.
Additional meaningful flags:
-cleanwhitespace => 1
Traverse the document and remove non-significant whitespace, as per removeWhitespace().
-xmlopts => HASHREF
Any options in this hash are passed directly to XML::Parser.
NOTE: this method does NOT work well on subclasses. I tried, but failed to fix it up. The problems is that CAM::XML::XMLTree has to be able to instantiate one of this class, but there's no really good way to communicate with it yet.
- new tagname
- new tagname, key => value, key => value, ...
-
Create a new XML tag. Optionally, you can set tag attributes at the same time.
- header
-
Return a string containing the following message, suffixed by a newline:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
INSTANCE METHODS
- getName
-
Returns the name of the node.
- setAttributes key, value, key => value, ...
-
Set the value of one or more XML attributes. If any keys are duplicated, only the last one set is recorded.
- getAttributeNames
-
Returns a list of the names of all the attributes of this node. The names are returned in arbitrary order.
- getAttributes
-
Returns a hash of all attributes.
- getAttribute KEY
-
Returns the value of the named attribute, or undef if it does not exist.
- getChildren
-
Returns an array of XML nodes and text objects contained by this node.
- getChild INDEX
-
Returns a child of this node. The argument is a zero-based index. Returns undef if the index is not valid.
- getChildNodes
-
Returns an array of XML nodes contained by this node (that is, unlike getChildren(), text nodes are ignored).
- getChildNode INDEX
-
Returns a CAM::XML child of this node (that is, unlike getChild(), text nodes are ignored. The argument is a zero-based index. Returns undef if the index is not valid.
- setChildren VALUE, VALUE, ...
-
Removes all the children from this node and replaces them with the supplied values. All of the values MUST be CAM::XML or CAM::XML::Text objects, or this method will abort and return false before any changes are made.
- add CAM::XML object
- add -text => text
- add -cdata => text
- add -xml => rawxml
- add <multiple elements of the above types>
-
Add content within the current tag. Order of addition may be significant. This content can be any one of 1) subsidiary XML tags (CAM::XML), 2) literal text content (-text or -cdata), or 3) pre-formatted XML content (-xml).
In -text and -cdata content, any reserved characters will be automatically escaped. Those two modes differ only in their XML representation: -cdata is more human-readable if there are a lot of "&", "<" and ">" characters in your text, where -text is usally more compact for short strings. These strings are not escaped until output.
Content in -xml mode is parsed in as CAM::XML objects. If it is not valid XML, a warning will be emitted and the add will fail.
- removeWhitespace
-
Clean out all non-significant whitespace. Whitespace is deemed non-significant if it is bracketed by tags. This might not be true in some data formats (e.g. HTML) so don't use this function carelessly.
- getInnerText
-
For the given node, descend through all of its children and concatenate all the text values that are found. If none, this method returns an empty string (not undef).
- getNodes -tag => TAGNAME
- getNodes -attr => ATTRNAME, -value => ATTRVALUE
- getNodes -path => PATH
-
Return an array of CAM::XML objects representing nodes that match the requested properties.
A path is a syntactic path into the XML doc something like an XPath:
'/' divides nodes '//' means any number of nodes '/[n]' means the nth child of a node (1-based) '<tag>[n]' means the nth instance of this node '/[-n]' means the nth child of a node, counting backward '/[last()]' means the last child of a node (same as [-1]) '/[@attr="value"]' means a node with this attribute value '/text()' means all of the text data inside a node (note this returns just one node, not all the nodes)
For example,
/html/body//table/tr[1]/td/a[@target="_blank"]
searches an XHTML body for all tables, and returns all anchor nodes in the first row which pop new windows.Please note that while this syntax resembles XPath, it is FAR from a complete (or even correct) implementation. It's useful for basic delving into an XML document, however.
- toString [OPTIONS...]
-
Serializes the tag and all subsidiary tags into an XML string. This is called recursively on any subsidiary CAM::XML objects. Note that the XML header is not prepended to this output.
The following optional arguments apply:
-formatted => boolean If true, the XML is indented nicely. Otherwise, no whitespace is inserted between tags. -textformat => boolean Only relevent if -formatted is true. If false, this prevents the formatting of pure text values. -level => number Indents this tag by the number of levels indicated. This implies -formatted => 1 -indent => number The number of spaces to indent per level if the output is formatted. By default, this is 2 (i.e. two spaces).
Example: -formatted => 0
<foo><bar>Baz</bar></foo>
Example: -formatted => 1
<foo> <bar> Baz </bar> </foo>
Example: -formatted => 1, textformat => 0
<foo> <bar>Baz</bar> </foo> Example: -formatted => 1, textformat => 0, -indent => 4 <foo> <bar>Baz</bar> </foo>
ENCODING
It is assumed that all text will be UTF-8. This includes any tag names, attribute keys and values, text content, and raw XML content that are added to the data structure.
CODING
This module has just over 90% code coverage in its regression tests, as reported by Devel::Cover via perl Build testcover
. The remaining 10% is mostly error conditions and a few conditional defaults.
This module passes many of the Perl Best Practices guidelines, as enforced by Perl::Critic v0.09. Notable exceptions are the legacy camelCase subroutine names.
AUTHOR
Clotho Advanced Media Inc., cpan@clotho.com
Primary Developer: Chris Dolan