NAME

CAM::XML - Encapsulation of a simple XML data structure

LICENSE

Copyright 2005 Clotho Advanced Media, Inc., <cpan@clotho.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SYNOPSIS

my $pollTag = CAM::XML->new('poll');

foreach my $q (@questions) {
  my $questionTag = CAM::XML->new('question');
  
  $questionTag->add(-text => $q->{text});
  my $choicesTag = CAM::XML->new('choices');
  
  foreach my $c (@{$q->{choices}}) {
    my $choiceTag = CAM::XML->new('choice');
    $choiceTag->setAttribute('value', $c->{value});
    $choiceTag->add(-text => $c->{text});
    $choicesTag->add($choiceTag);
  }
  $questionTag->add($choicesTag);
  $pollTag->add($questionTag);
}
print CAM::XML->header();
print $pollTag->toString();

DESCRIPTION

This module reads and writes XML into a simple object model. It is optimized for ease of creating code that interacts with XML.

This module is not as powerful or as standards-compliant as say XML::LibXML, XML::SAX, XML::DOM, etc, but it's darn easy to use. I recommend it to people who want to just read/write a quick but valid XML file and don't want to bother with the bigger modules.

In our experience, this module is actually easier to use than XML::Simple because the latter makes some assumptions about XML structure that prevents it from handling all XML files well. YMMV.

However, one exception to the simplicity claimed above is our implementation of a subset of XPath. That's not very simple. Sorry.

CLASS METHODS

parse XMLSTRING
parse -string => XMLSTRING
parse -filename => XMLFILENAME
parse -filehandle => XMLFILEHANDLE

Parse an incoming stream of XML into a CAM::XML heirarchy. This method just hands the first argument off to XML::Parser, so it can accept any style of arg that XML::Parser can. Note that XML::Parser says the filehandle style should pass an IO::Handle object. This can be called as a class method or an instance method.

Additional meaningful flags:

-cleanwhitespace => 1

Traverse the document and remove non-significant whitespace, as per removeWhitespace().

-xmlopts => HASHREF

Any options in this hash are passed directly to XML::Parser.

NOTE: this method does NOT work well on subclasses. I tried, but failed to fix it up. The problems is that CAM::XML::XMLTree has to be able to instantiate one of this class, but there's no really good way to communicate with it yet.

new tagname
new tagname, key => value, key => value, ...

Create a new XML tag. Optionally, you can set tag attributes at the same time.

Return a string containing the following message, suffixed by a newline:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

INSTANCE METHODS

getName

Returns the name of the node.

setAttributes key, value, key => value, ...

Set the value of one or more XML attributes. If any keys are duplicated, only the last one set is recorded.

getAttributeNames

Returns a list of the names of all the attributes of this node. The names are returned in arbitrary order.

getAttributes

Returns a hash of all attributes.

getAttribute KEY

Returns the value of the named attribute, or undef if it does not exist.

getChildren

Returns an array of XML nodes and text objects contained by this node.

getChild INDEX

Returns a child of this node. The argument is a zero-based index. Returns undef if the index is not valid.

getChildNodes

Returns an array of XML nodes contained by this node (that is, unlike getChildren(), text nodes are ignored).

getChildNode INDEX

Returns a CAM::XML child of this node (that is, unlike getChild(), text nodes are ignored. The argument is a zero-based index. Returns undef if the index is not valid.

setChildren VALUE, VALUE, ...

Removes all the children from this node and replaces them with the supplied values. All of the values MUST be CAM::XML or CAM::XML::Text objects, or this method will abort and return false before any changes are made.

add CAM::XML object
add -text => text
add -cdata => text
add -xml => rawxml
add <multiple elements of the above types>

Add content within the current tag. Order of addition may be significant. This content can be any one of 1) subsidiary XML tags (CAM::XML), 2) literal text content (-text or -cdata), or 3) pre-formatted XML content (-xml).

In -text and -cdata content, any reserved characters will be automatically escaped. Those two modes differ only in their XML representation: -cdata is more human-readable if there are a lot of "&", "<" and ">" characters in your text, where -text is usally more compact for short strings. These strings are not escaped until output.

Content in -xml mode is parsed in as CAM::XML objects. If it is not valid XML, a warning will be emitted and the add will fail.

removeWhitespace

Clean out all non-significant whitespace. Whitespace is deemed non-significant if it is bracketed by tags. This might not be true in some data formats (e.g. HTML) so don't use this function carelessly.

getInnerText

For the given node, descend through all of its children and concatenate all the text values that are found. If none, this method returns an empty string (not undef).

getNodes -tag => TAGNAME
getNodes -attr => ATTRNAME, -value => ATTRVALUE
getNodes -path => PATH

Return an array of CAM::XML objects representing nodes that match the requested properties.

A path is a syntactic path into the XML doc something like an XPath:

'/' divides nodes
'//' means any number of nodes
'/[n]' means the nth child of a node (1-based)
'<tag>[n]' means the nth instance of this node
'/[-n]' means the nth child of a node, counting backward
'/[last()]' means the last child of a node (same as [-1])
'/[@attr="value"]' means a node with this attribute value
'/text()' means all of the text data inside a node
          (note this returns just one node, not all the nodes)

For example, /html/body//table/tr[1]/td/a[@target="_blank"] searches an XHTML body for all tables, and returns all anchor nodes in the first row which pop new windows.

Please note that while this syntax resembles XPath, it is FAR from a complete (or even correct) implementation. It's useful for basic delving into an XML document, however.

toString [OPTIONS...]

Serializes the tag and all subsidiary tags into an XML string. This is called recursively on any subsidiary CAM::XML objects. Note that the XML header is not prepended to this output.

The following optional arguments apply:

-formatted => boolean
      If true, the XML is indented nicely.  Otherwise, no whitespace
      is inserted between tags.
-textformat => boolean
      Only relevent if -formatted is true.  If false, this prevents
      the formatting of pure text values.
-level => number
      Indents this tag by the number of levels indicated.  This implies
      -formatted => 1
-indent => number
      The number of spaces to indent per level if the output is
      formatted.  By default, this is 2 (i.e. two spaces).

Example: -formatted => 0

<foo><bar>Baz</bar></foo>

Example: -formatted => 1

<foo>
  <bar>
    Baz
  </bar>
</foo>

Example: -formatted => 1, textformat => 0

<foo>
  <bar>Baz</bar>
</foo>
Example: -formatted => 1, textformat => 0, -indent => 4

<foo>
    <bar>Baz</bar>
</foo>

ENCODING

It is assumed that all text will be UTF-8. This includes any tag names, attribute keys and values, text content, and raw XML content that are added to the data structure.

CODING

This module has just over 90% code coverage in its regression tests, as reported by Devel::Cover via perl Build testcover. The remaining 10% is mostly error conditions and a few conditional defaults.

This module passes many of the Perl Best Practices guidelines, as enforced by Perl::Critic v0.09. Notable exceptions are the legacy camelCase subroutine names.

AUTHOR

Clotho Advanced Media Inc., cpan@clotho.com

Primary Developer: Chris Dolan