NAME

XML::PugiXML - Perl binding for pugixml C++ XML parser

SYNOPSIS

use XML::PugiXML;

my $doc = XML::PugiXML->new;
$doc->load_string('<root><item id="1">Hello</item></root>');

my $root = $doc->root;
my $item = $root->child('item');
print $item->text, "\n";           # Hello
print $item->attr('id')->value, "\n";  # 1

# XPath
my $node = $doc->select_node('//item[@id="1"]');
print $node->text, "\n";

# Compiled XPath (faster for repeated queries)
my $xpath = $doc->compile_xpath('//item');
my @items = $xpath->evaluate_nodes($root);

# Modification
my $new = $root->append_child('item');
$new->set_text('World');
$new->set_attr('id', '2');
$doc->save_file('output.xml');

# Formatting options
print $doc->to_string("  ", XML::PugiXML::FORMAT_INDENT());

# Node cloning
my $copy = $root->append_copy($item);

DESCRIPTION

XML::PugiXML provides a Perl interface to the pugixml C++ XML parsing library. It offers fast parsing, XPath support, and a clean API. All string inputs are automatically upgraded to UTF-8, and all outputs are UTF-8 flagged.

METHODS

XML::PugiXML (Document)

new()

Create a new empty XML document. Subclassing is not supported: the returned object is always blessed into XML::PugiXML regardless of the calling class.

load_file($path, $parse_options?)

Load and parse XML from a file. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

load_string($xml, $parse_options?)

Parse XML from a string. Returns true on success. Optional $parse_options (default PARSE_DEFAULT).

save_file($path, $indent?, $flags?)

Save the document to a file. Returns true on success. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

to_string($indent?, $flags?)

Serialize the document to an XML string. Optional $indent (default "\t") and $flags (default FORMAT_DEFAULT).

reset()

Clear the document, removing all nodes. Existing Node and Attr handles become stale: accessing them croaks with "Stale node/attribute handle". Use valid() to check without croaking. The same applies after load_file() or load_string() replaces content.

root()

Return the document element (root node).

child($name)

Get a direct child by name.

select_node($xpath)

Execute XPath query, return single result. Returns an XML::PugiXML::Node or XML::PugiXML::Attr depending on the query, or undef if there is no match.

select_nodes($xpath)

Execute XPath query, return list of results. Returns a mix of XML::PugiXML::Node and XML::PugiXML::Attr objects as appropriate. Empty list if no match.

compile_xpath($xpath)

Compile an XPath expression for repeated use. Returns an XML::PugiXML::XPath object.

Format Constants

FORMAT_DEFAULT()

Default formatting (indent with tabs).

FORMAT_INDENT()

Indent output.

FORMAT_NO_DECLARATION()

Omit XML declaration.

FORMAT_RAW()

No formatting (compact output).

FORMAT_WRITE_BOM()

Write BOM (byte order mark).

FORMAT_INDENT_ATTRIBUTES()

Indent each attribute on its own line (adds to other format flags).

Node Type Constants

NODE_NULL, NODE_DOCUMENT, NODE_ELEMENT, NODE_PCDATA
NODE_CDATA, NODE_COMMENT, NODE_PI, NODE_DECLARATION, NODE_DOCTYPE

Integer values returned by $node->type.

Parse Constants

PARSE_DEFAULT()

Default parsing options.

PARSE_MINIMAL()

Minimal parsing (fastest, no comments/PI/DOCTYPE).

PARSE_PI()

Parse processing instructions.

PARSE_COMMENTS()

Parse comments.

PARSE_CDATA()

Parse CDATA sections.

PARSE_WS_PCDATA()

Preserve whitespace-only PCDATA nodes.

PARSE_WS_PCDATA_SINGLE()

Preserve whitespace-only PCDATA only when it is the sole child of its parent (a leaner alternative to PARSE_WS_PCDATA).

PARSE_ESCAPES()

Parse character/entity references.

PARSE_EOL()

Normalize end-of-line characters.

PARSE_DECLARATION()

Parse XML declaration.

PARSE_DOCTYPE()

Parse DOCTYPE.

PARSE_FULL()

Full parsing (all features enabled).

XML::PugiXML::Node

name(), value(), text()

Get node name, value, or text content.

type()

Return the node type as an integer. Compare against the NODE_* constants listed under "Node Type Constants".

path($delimiter?)

Return the absolute XPath path to this node. Default delimiter is '/'.

hash()

Return a hash value derived from the node's internal pointer. Useful for handle-identity comparison within a process. Not stable across process runs; do not persist.

offset_debug()

Return the source offset of this node (for debugging).

valid()

Return true if this is a valid node handle.

root()

Return the document element from any node (consistent with $doc->root).

parent()

Get parent node.

first_child(), last_child()

Get first or last child node.

next_sibling($name?), previous_sibling($name?)

Get next or previous sibling. Optionally filter by name.

child($name)

Get a named child node.

children($name?)

Return list of child nodes, optionally filtered by name. All node types are returned; under PARSE_WS_PCDATA or similar, PCDATA, comment, and PI nodes appear too. Filter by ->type to keep only elements:

grep { $_->type == XML::PugiXML::NODE_ELEMENT() } $node->children
find_child_by_attribute($tag, $attr_name, $attr_value)

Find first child with given tag name and attribute value.

Attributes

attr($name)

Get attribute by name.

attrs()

Return list of all attributes.

set_attr($name, $value)

Set attribute value (creates if doesn't exist). Returns the attribute.

append_attr($name), prepend_attr($name)

Add attribute at end or beginning.

remove_attr($name)

Remove an attribute by name. Returns true on success.

Modification

append_child($name), prepend_child($name)

Add child element at end or beginning.

insert_child_before($name, $ref_node), insert_child_after($name, $ref_node)

Insert child element before or after a reference node.

append_copy($source), prepend_copy($source)

Clone and append/prepend a node (deep copy).

insert_copy_before($source, $ref), insert_copy_after($source, $ref)

Clone and insert node before/after reference.

append_cdata($content)

Add a CDATA section with the given content.

append_comment($content)

Add a comment node with the given content.

append_pi($target, $data?)

Add a processing instruction (e.g., <?target data?>).

remove_child($node)

Remove a child node. Returns true on success.

set_name($name), set_value($value), set_text($text)

Modify node properties.

XPath

select_node($xpath)

Execute XPath relative to this node, return single result (Node or Attr).

select_nodes($xpath)

Execute XPath relative to this node, return list of results (Node and/or Attr).

XML::PugiXML::Attr

name(), value()

Get attribute name and value.

as_int(), as_uint()

Get value as 32-bit signed/unsigned integer.

as_llong(), as_ullong()

Get value as 64-bit signed/unsigned integer. On 32-bit Perl (IVSIZE < 8), returns a string to avoid truncation.

as_double()

Get value as floating-point number.

as_bool()

Get value as boolean (recognizes "true", "1", "yes", "on").

element()

Return the parent element node that owns this attribute.

set_value($value)

Set attribute value.

set_name($name)

Set attribute name. Returns true on success.

valid()

Return true if this is a valid attribute handle.

XML::PugiXML::XPath (Compiled Queries)

evaluate_node($context_node)

Evaluate XPath and return single result (Node or Attr).

evaluate_nodes($context_node)

Evaluate XPath and return list of results (Node and/or Attr).

evaluate_string($context_node)

Evaluate XPath and return string result.

evaluate_number($context_node)

Evaluate XPath and return numeric result.

evaluate_boolean($context_node)

Evaluate XPath and return boolean result.

ERROR HANDLING

Parse and save operations return false on failure and set $@ with an error message. XPath syntax errors throw exceptions via croak().

# Parse errors - check return value
my $ok = $doc->load_string('<bad>');
if (!$ok) {
    warn "Parse failed: $@";
}

# XPath errors - use eval
eval { $doc->select_node('[invalid'); };
if ($@) {
    warn "XPath error: $@";
}

MEMORY MODEL

Node and attribute handles keep the parent document alive through reference counting. You can safely use a node after the document variable goes out of scope:

my $node;
{
    my $doc = XML::PugiXML->new;
    $doc->load_string('<root><item/></root>');
    $node = $doc->root->child('item');
}
# $node is still valid here

PERFORMANCE

Benchmarked against XML::LibXML (100-5000 element documents):

Parsing:          8-12x faster
XPath queries:    2-13x faster
Tree traversal:   15-17x faster
DOM modification: 2-11x faster
Serialization:    2-4x faster

See bench/benchmark.pl for details.

SECURITY

pugixml does not process external entities (XXE) and is therefore safe against XXE attacks by default.

THREAD SAFETY

Different document instances may be used concurrently from different threads. Concurrent access to the same document is not safe.

SEE ALSO

Alien::pugixml, XML::LibXML, the pugixml home page at https://pugixml.org/.

AUTHOR

vividsnow

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.