NAME
XML::PugiXML - Perl binding for pugixml C++ XML parser
SYNOPSIS
use XML::PugiXML;
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');
my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n"; # Hello
print $item->attr('id')->value, "\n"; # 1
# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";
# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);
# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2');
$doc->save_file('output.xml');
# Formatting options
print $doc->to_string(" ", XML::PugiXML::FORMAT_INDENT());
# Node cloning
my $copy = $root->append_copy($item);
DESCRIPTION
XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API.
String inputs must be UTF-8: either Perl Unicode strings (the internal representation is passed through), or raw UTF-8 bytes. Latin-1 byte strings are not auto-converted; call utf8::upgrade on them first if needed. All string outputs are UTF-8 flagged.
METHODS
XML::PugiXML (Document)
- new()
-
Create a new empty XML document. Subclassing is not supported: the returned object is always blessed into
XML::PugiXMLregardless of the calling class. - load_file($path, $parse_options?)
-
Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- load_string($xml, $parse_options?)
-
Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- save_file($path, $indent?, $flags?)
-
Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- to_string($indent?, $flags?)
-
Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- reset()
-
Clear the document, removing all nodes. Existing Node and Attr handles become stale: accessing them croaks with "Stale node/attribute handle". Use
valid()to check without croaking. The same applies afterload_file()orload_string()replaces content. - root()
-
Return the document element (root node).
- child($name)
-
Get a direct child by name.
- select_node($xpath)
-
Execute XPath query, return single result. Returns an
XML::PugiXML::NodeorXML::PugiXML::Attrdepending on the query, orundefif there is no match. - select_nodes($xpath)
-
Execute XPath query, return list of results. Returns a mix of
XML::PugiXML::NodeandXML::PugiXML::Attrobjects as appropriate. Empty list if no match. - compile_xpath($xpath)
-
Compile an XPath expression for repeated use. Returns an
XML::PugiXML::XPathobject.
Format Constants
- FORMAT_DEFAULT()
-
Default formatting (indent with tabs).
- FORMAT_INDENT()
-
Indent output.
- FORMAT_NO_DECLARATION()
-
Omit XML declaration.
- FORMAT_RAW()
-
No formatting (compact output).
- FORMAT_WRITE_BOM()
-
Write BOM (byte order mark).
- FORMAT_INDENT_ATTRIBUTES()
-
Indent each attribute on its own line (adds to other format flags).
Node Type Constants
- NODE_NULL, NODE_DOCUMENT, NODE_ELEMENT, NODE_PCDATA
- NODE_CDATA, NODE_COMMENT, NODE_PI, NODE_DECLARATION, NODE_DOCTYPE
-
Integer values returned by
$node->type.
Parse Constants
- PARSE_DEFAULT()
-
Default parsing options.
- PARSE_MINIMAL()
-
Minimal parsing (fastest, no comments/PI/DOCTYPE).
- PARSE_PI()
-
Parse processing instructions.
- PARSE_COMMENTS()
-
Parse comments.
- PARSE_CDATA()
-
Parse CDATA sections.
- PARSE_WS_PCDATA()
-
Preserve whitespace-only PCDATA nodes.
- PARSE_WS_PCDATA_SINGLE()
-
Preserve whitespace-only PCDATA only when it is the sole child of its parent (a leaner alternative to PARSE_WS_PCDATA).
- PARSE_ESCAPES()
-
Parse character/entity references.
- PARSE_EOL()
-
Normalize end-of-line characters.
- PARSE_DECLARATION()
-
Parse XML declaration.
- PARSE_DOCTYPE()
-
Parse DOCTYPE.
- PARSE_FULL()
-
Full parsing (all features enabled).
XML::PugiXML::Node
- name(), value(), text()
-
Get node name, value, or text content.
- type()
-
Return the node type as an integer. Compare against the
NODE_*constants listed under "Node Type Constants". - path($delimiter?)
-
Return the absolute XPath path to this node. Default delimiter is '/'.
- hash()
-
Return a hash value derived from the node's internal pointer. Useful for handle-identity comparison within a process. Not stable across process runs; do not persist.
- offset_debug()
-
Return the source offset of this node (for debugging).
- valid()
-
Return true if this is a valid node handle.
- root()
-
Return the document element from any node (consistent with
$doc->root).
Navigation
- parent()
-
Get parent node.
- first_child(), last_child()
-
Get first or last child node.
- next_sibling($name?), previous_sibling($name?)
-
Get next or previous sibling. Optionally filter by name.
- child($name)
-
Get a named child node.
- children($name?)
-
Return list of child nodes, optionally filtered by name. All node types are returned; under
PARSE_WS_PCDATAor similar, PCDATA, comment, and PI nodes appear too. Filter by->typeto keep only elements:grep { $_->type == XML::PugiXML::NODE_ELEMENT() } $node->children - find_child_by_attribute($tag, $attr_name, $attr_value)
-
Find first child with given tag name and attribute value.
Attributes
- attr($name)
-
Get attribute by name.
- attrs()
-
Return list of all attributes.
- set_attr($name, $value)
-
Set attribute value (creates if doesn't exist). Returns the attribute.
- append_attr($name), prepend_attr($name)
-
Add attribute at end or beginning.
- remove_attr($name)
-
Remove an attribute by name. Returns true on success.
Modification
- append_child($name), prepend_child($name)
-
Add child element at end or beginning.
- insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)
-
Insert child element before or after a reference node.
- append_copy($source), prepend_copy($source)
-
Clone and append/prepend a node (deep copy).
- insert_copy_before($source, $ref), insert_copy_after($source, $ref)
-
Clone and insert node before/after reference.
- append_cdata($content)
-
Add a CDATA section with the given content.
- append_comment($content)
-
Add a comment node with the given content.
- append_pi($target, $data?)
-
Add a processing instruction (e.g.,
<?target data?>). - remove_child($node)
-
Remove a child node. Returns true on success.
- set_name($name), set_value($value), set_text($text)
-
Modify node properties.
XPath
- select_node($xpath)
-
Execute XPath relative to this node, return single result (Node or Attr).
- select_nodes($xpath)
-
Execute XPath relative to this node, return list of results (Node and/or Attr).
XML::PugiXML::Attr
- name(), value()
-
Get attribute name and value.
- as_int(), as_uint()
-
Get value as 32-bit signed/unsigned integer.
- as_llong(), as_ullong()
-
Get value as 64-bit signed/unsigned integer. On 32-bit Perl (IVSIZE < 8), returns a string to avoid truncation.
- as_double()
-
Get value as floating-point number.
- as_bool()
-
Get value as boolean (recognizes "true", "1", "yes", "on").
- element()
-
Return the parent element node that owns this attribute.
- set_value($value)
-
Set attribute value.
- set_name($name)
-
Set attribute name. Returns true on success.
- valid()
-
Return true if this is a valid attribute handle.
XML::PugiXML::XPath (Compiled Queries)
- evaluate_node($context_node)
-
Evaluate XPath and return single result (Node or Attr).
- evaluate_nodes($context_node)
-
Evaluate XPath and return list of results (Node and/or Attr).
- evaluate_string($context_node)
-
Evaluate XPath and return string result.
- evaluate_number($context_node)
-
Evaluate XPath and return numeric result.
- evaluate_boolean($context_node)
-
Evaluate XPath and return boolean result.
ERROR HANDLING
Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().
# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
warn "Parse failed: $@";
}
# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
warn "XPath error: $@";
}
MEMORY MODEL
Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:
my $node;
{
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item/></root>');
$node = $doc->root->child('item');
}
# $node is still valid here
PERFORMANCE
Benchmarked against XML::LibXML (100-5000 element documents):
Parsing: 8-12x faster
XPath queries: 2-13x faster
Tree traversal: 15-17x faster
DOM modification: 2-11x faster
Serialization: 2-4x faster
See bench/benchmark.pl for details.
SECURITY
pugixml does not process external entities (XXE) and is therefore safe against XXE attacks by default.
THREAD SAFETY
Different document instances may be used concurrently from different threads. Concurrent access to the same document is not safe.
SEE ALSO
Alien::pugixml, XML::LibXML, the pugixml home page at https://pugixml.org/.
AUTHOR
vividsnow
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.