NAME
XML::PugiXML - Perl binding for pugixml C++ XML parser
SYNOPSIS
use XML::PugiXML;
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');
my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n"; # Hello
print $item->attr('id')->value, "\n"; # 1
# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";
# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);
# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2'); # Convenience method
$doc->save_file('output.xml');
# Formatting options
print $doc->to_string(" ", XML::PugiXML::FORMAT_INDENT());
# Node cloning
my $copy = $root->append_copy($item);
DESCRIPTION
XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API.
METHODS
XML::PugiXML (Document)
- new()
-
Create a new empty XML document.
- load_file($path, $parse_options?)
-
Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- load_string($xml, $parse_options?)
-
Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).
- save_file($path, $indent?, $flags?)
-
Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- to_string($indent?, $flags?)
-
Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).
- reset()
-
Clear the document, removing all nodes. Existing Node and Attr handles become invalid after this call.
- root()
-
Return the document element (root node).
- child($name)
-
Get a direct child by name.
- select_node($xpath)
-
Execute XPath query, return single node result.
- select_nodes($xpath)
-
Execute XPath query, return list of nodes.
Note: XPath expressions that select attributes (e.g.
//@id) are not supported and will return undef/empty list. Only element nodes are returned. - compile_xpath($xpath)
-
Compile an XPath expression for repeated use. Returns an XML::PugiXML::XPath object.
Format Constants
- FORMAT_DEFAULT()
-
Default formatting (indent with tabs).
- FORMAT_INDENT()
-
Indent output.
- FORMAT_NO_DECLARATION()
-
Omit XML declaration.
- FORMAT_RAW()
-
No formatting (compact output).
- FORMAT_WRITE_BOM()
-
Write BOM (byte order mark).
Parse Constants
- PARSE_DEFAULT()
-
Default parsing options.
- PARSE_MINIMAL()
-
Minimal parsing (fastest, no comments/PI/DOCTYPE).
- PARSE_PI()
-
Parse processing instructions.
- PARSE_COMMENTS()
-
Parse comments.
- PARSE_CDATA()
-
Parse CDATA sections.
- PARSE_WS_PCDATA()
-
Preserve whitespace-only PCDATA nodes.
- PARSE_ESCAPES()
-
Parse character/entity references.
- PARSE_EOL()
-
Normalize end-of-line characters.
- PARSE_DECLARATION()
-
Parse XML declaration.
- PARSE_DOCTYPE()
-
Parse DOCTYPE.
- PARSE_FULL()
-
Full parsing (all features enabled).
XML::PugiXML::Node
- name(), value(), text()
-
Get node name, value, or text content.
- type()
-
Return the node type as an integer. Values: 0=null, 1=document, 2=element, 3=pcdata, 4=cdata, 5=comment, 6=pi, 7=declaration.
- path($delimiter?)
-
Return the absolute XPath path to this node. Default delimiter is '/'.
- hash()
-
Return a hash value for this node. Useful for comparison.
- offset_debug()
-
Return the source offset of this node (for debugging).
- valid()
-
Return true if this is a valid node handle.
- root()
-
Return the document node (type=1) from any node. Note: this returns the document node, not the document element. Use
$node->root->first_childto get the document element from a node.
Navigation
- parent()
-
Get parent node.
- first_child(), last_child()
-
Get first or last child node.
- next_sibling($name?), previous_sibling($name?)
-
Get next or previous sibling. Optionally filter by name.
- child($name)
-
Get a named child node.
- children($name?)
-
Return list of child nodes, optionally filtered by name.
- find_child_by_attribute($tag, $attr_name, $attr_value)
-
Find first child with given tag name and attribute value.
Attributes
- attr($name)
-
Get attribute by name.
- attrs()
-
Return list of all attributes.
- set_attr($name, $value)
-
Set attribute value (creates if doesn't exist). Returns the attribute.
- append_attr($name), prepend_attr($name)
-
Add attribute at end or beginning.
- remove_attr($name)
-
Remove an attribute by name. Returns true on success.
Modification
- append_child($name), prepend_child($name)
-
Add child element at end or beginning.
- insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)
-
Insert child element before or after a reference node.
- append_copy($source), prepend_copy($source)
-
Clone and append/prepend a node (deep copy).
- insert_copy_before($source, $ref), insert_copy_after($source, $ref)
-
Clone and insert node before/after reference.
- append_cdata($content)
-
Add a CDATA section with the given content.
- append_comment($content)
-
Add a comment node with the given content.
- append_pi($target, $data?)
-
Add a processing instruction. E.g.,
<?target data?> - remove_child($node)
-
Remove a child node. Returns true on success.
- set_name($name), set_value($value), set_text($text)
-
Modify node properties.
XPath
- select_node($xpath)
-
Execute XPath relative to this node, return single node.
- select_nodes($xpath)
-
Execute XPath relative to this node, return list of nodes.
Note: XPath expressions that select attributes (e.g.
//@id) are not supported and will return undef/empty list. Only element nodes are returned.
XML::PugiXML::Attr
- name(), value()
-
Get attribute name and value.
- as_int(), as_uint()
-
Get value as 32-bit signed/unsigned integer.
- as_llong(), as_ullong()
-
Get value as 64-bit signed/unsigned integer.
- as_double()
-
Get value as floating-point number.
- as_bool()
-
Get value as boolean (recognizes "true", "1", "yes", "on").
- set_value($value)
-
Set attribute value.
- valid()
-
Return true if this is a valid attribute handle.
XML::PugiXML::XPath (Compiled Queries)
- evaluate_node($context_node)
-
Evaluate XPath and return single node result.
- evaluate_nodes($context_node)
-
Evaluate XPath and return list of nodes.
Note:
evaluate_nodeandevaluate_nodesonly return element nodes. Attribute-selecting XPath expressions return undef/empty list. - evaluate_string($context_node)
-
Evaluate XPath and return string result.
- evaluate_number($context_node)
-
Evaluate XPath and return numeric result.
- evaluate_boolean($context_node)
-
Evaluate XPath and return boolean result.
ERROR HANDLING
Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().
# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
warn "Parse failed: $@";
}
# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
warn "XPath error: $@";
}
MEMORY MODEL
Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:
my $node;
{
my $doc = XML::PugiXML->new;
$doc->load_string('<root><item/></root>');
$node = $doc->root->child('item');
}
# $node is still valid here
PERFORMANCE
Benchmarked against XML::LibXML (100-5000 element documents):
Parsing: 8-12x faster
XPath queries: 2-13x faster
Tree traversal: 15-17x faster
DOM modification: 2-11x faster
Serialization: 2-4x faster
See bench/benchmark.pl for details.
SECURITY
This module uses pugixml which does NOT process external entities (XXE) by default, making it safe against XXE attacks.
THREAD SAFETY
This module is not thread-safe. Each thread should use its own document instances.
AUTHOR
vividsnow
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.