NAME
Mojo::DOM - Minimalistic XML/HTML5 DOM Parser With CSS3 Selectors
SYNOPSIS
use Mojo::DOM;
# Parse
my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>');
# Find
my $b = $dom->at('#b');
print $b->text;
# Walk
print $dom->div->p->[0]->text;
print $dom->div->p->[1]->{id};
# Iterate
$dom->find('p[id]')->each(sub { print shift->{id} });
# Loop
for my $e ($dom->find('p[id]')->each) {
print $e->text;
}
# Modify
$dom->div->p->[1]->append('<p id="c">C</p>');
# Render
print $dom;
DESCRIPTION
Mojo::DOM is a minimalistic and very relaxed XML/HTML5 DOM parser with support for CSS3 selectors. It will even try to interpret broken XML, so you should not use it for validation.
SELECTORS
All CSS3 selectors that make sense for a standalone parser are supported.
*
Any element.
my $first = $dom->at('*');
E
An element of type E
.
my $title = $dom->at('title');
E[foo]
An E
element with a foo
attribute.
my $links = $dom->find('a[href]');
E[foo="bar"]
An E
element whose foo
attribute value is exactly equal to bar
.
my $fields = $dom->find('input[name="foo"]');
E[foo~="bar"]
An E
element whose foo
attribute value is a list of whitespace-separated values, one of which is exactly equal to bar
.
my $fields = $dom->find('input[name~="foo"]');
E[foo^="bar"]
An E
element whose foo
attribute value begins exactly with the string bar
.
my $fields = $dom->find('input[name^="f"]');
E[foo$="bar"]
An E
element whose foo
attribute value ends exactly with the string bar
.
my $fields = $dom->find('input[name$="o"]');
E[foo*="bar"]
An E
element whose foo
attribute value contains the substring bar
.
my $fields = $dom->find('input[name*="fo"]');
E:root
An E
element, root of the document.
my $root = $dom->at(':root');
E:checked
A user interface element E
which is checked (for instance a radio-button or checkbox).
my $input = $dom->at(':checked');
E:empty
An E
element that has no children (including text nodes).
my $empty = $dom->find(':empty');
E:nth-child(n)
An E
element, the n-th
child of its parent.
my $third = $dom->at('div:nth-child(3)');
my $odd = $dom->find('div:nth-child(odd)');
my $even = $dom->find('div:nth-child(even)');
my $top3 = $dom->find('div:nth-child(-n+3)');
E:nth-last-child(n)
An E
element, the n-th
child of its parent, counting from the last one.
my $third = $dom->at('div:nth-last-child(3)');
my $odd = $dom->find('div:nth-last-child(odd)');
my $even = $dom->find('div:nth-last-child(even)');
my $bottom3 = $dom->find('div:nth-last-child(-n+3)');
E:nth-of-type(n)
An E
element, the n-th
sibling of its type.
my $third = $dom->at('div:nth-of-type(3)');
my $odd = $dom->find('div:nth-of-type(odd)');
my $even = $dom->find('div:nth-of-type(even)');
my $top3 = $dom->find('div:nth-of-type(-n+3)');
E:nth-last-of-type(n)
An E
element, the n-th
sibling of its type, counting from the last one.
my $third = $dom->at('div:nth-last-of-type(3)');
my $odd = $dom->find('div:nth-last-of-type(odd)');
my $even = $dom->find('div:nth-last-of-type(even)');
my $bottom3 = $dom->find('div:nth-last-of-type(-n+3)');
E:first-child
An E
element, first child of its parent.
my $first = $dom->at('div p:first-child');
E:last-child
An E
element, last child of its parent.
my $last = $dom->at('div p:last-child');
E:first-of-type
An E
element, first sibling of its type.
my $first = $dom->at('div p:first-of-type');
E:last-of-type
An E
element, last sibling of its type.
my $last = $dom->at('div p:last-of-type');
E:only-child
An E
element, only child of its parent.
my $lonely = $dom->at('div p:only-child');
E:only-of-type
An E
element, only sibling of its type.
my $lonely = $dom->at('div p:only-of-type');
E.warning
my $warning = $dom->at('div.warning');
An E
element whose class is "warning".
E#myid
my $foo = $dom->at('div#foo');
An E
element with ID
equal to "myid".
E:not(s)
An E
element that does not match simple selector s
.
my $others = $dom->at('div p:not(:first-child)');
E F
An F
element descendant of an E
element.
my $headlines = $dom->find('div h1');
E > F
An F
element child of an E
element.
my $headlines = $dom->find('html > body > div > h1');
E + F
An F
element immediately preceded by an E
element.
my $second = $dom->find('h1 + h2');
E ~ F
An F
element preceded by an E
element.
my $second = $dom->find('h1 ~ h2');
E, F, G
Elements of type E
, F
and G
.
my $headlines = $dom->find('h1, h2, h3');
E[foo=bar][bar=baz]
An E
element whose attributes match all following attribute selectors.
my $links = $dom->find('a[foo^="b"][foo$="ar"]');
CASE SENSITIVITY
Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.
my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
print $dom->at('p')->text;
print $dom->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
print $dom->at('P')->text;
print $dom->P->{ID};
XML detection can be also deactivated with the xml
method.
# XML sematics
$dom->xml(1);
# HTML5 semantics
$dom->xml(0);
METHODS
Mojo::DOM inherits all methods from Mojo::Base and implements the following new ones.
new
my $dom = Mojo::DOM->new;
my $dom = Mojo::DOM->new(xml => 1);
my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>');
my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>', xml => 1);
Construct a new Mojo::DOM object.
all_text
my $text = $dom->all_text;
Extract all text content from DOM structure.
append
$dom = $dom->append('<p>Hi!</p>');
Append to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>');
append_content
$dom = $dom->append_content('<p>Hi!</p>');
Append to element content.
# "<div><h1>AB</h1></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B');
at
my $result = $dom->at('html title');
Find a single element with CSS3 selectors.
attrs
my $attrs = $dom->attrs;
my $foo = $dom->attrs('foo');
$dom = $dom->attrs({foo => 'bar'});
$dom = $dom->attrs(foo => 'bar');
Element attributes.
# Direct hash access to attributes is also available
print $dom->{foo};
print $dom->div->{id};
charset
my $charset = $dom->charset;
$dom = $dom->charset('UTF-8');
Charset used for decoding and encoding XML.
children
my $collection = $dom->children;
my $collection = $dom->children('div')
Return a collection containing the children of this element, similar to find
.
# Child elements are also automatically available as object methods
print $dom->div->text;
print $dom->div->[23]->text;
$dom->div->each(sub { print $_->text });
content_xml
my $xml = $dom->content_xml;
Render content of this element to XML.
find
my $collection = $dom->find('html title');
Find elements with CSS3 selectors and return a collection.
print $dom->find('div')->[23]->text;
Collections are blessed arrays supporting these methods.
each
-
my @elements = $dom->find('div')->each; $dom = $dom->find('div')->each(sub { print shift->text }); $dom = $dom->find('div')->each(sub { my ($e, $count) = @_; print "$count: ", $e->text; });
Iterate over whole collection.
to_xml
-
my $xml = $dom->find('div')->to_xml;
Render collection to XML. Note that this method is EXPERIMENTAL and might change without warning!
until
-
$dom = $dom->find('div')->until(sub { $_->text =~ /x/ && print $_->text }); $dom = $dom->find('div')->until(sub { my ($e, $count) = @_; $e->text =~ /x/ && print "$count: ", $e->text; });
Iterate over collection until closure returns true.
while
-
$dom = $dom->find('div')->while(sub { print($_->text) && $_->text =~ /x/ }); $dom = $dom->find('div')->while(sub { my ($e, $count) = @_; print("$count: ", $e->text) && $e->text =~ /x/; });
Iterate over collection while closure returns true.
namespace
my $namespace = $dom->namespace;
Find element namespace.
parent
my $parent = $dom->parent;
Parent of element.
parse
$dom = $dom->parse('<foo bar="baz">test</foo>');
Parse XML document.
prepend
$dom = $dom->prepend('<p>Hi!</p>');
Prepend to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>');
prepend_content
$dom = $dom->prepend_content('<p>Hi!</p>');
Prepend to element content.
# "<div><h2>AB</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A');
replace
$dom = $dom->replace('<div>test</div>');
Replace elements.
# "<div><h2>B</h2></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');
replace_content
$dom = $dom->replace_content('test');
Replace element content.
# "<div><h1>B</h1></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B');
root
my $root = $dom->root;
Find root element.
text
my $text = $dom->text;
Extract text content from element only, not including child elements.
to_xml
my $xml = $dom->to_xml;
Render DOM to XML.
tree
my $tree = $dom->tree;
$dom = $dom->tree(['root', ['text', 'lalala']]);
Document Object Model.
type
my $type = $dom->type;
$dom = $dom->type('html');
Element type.
xml
my $xml = $dom->xml;
$dom = $dom->xml(1);
Disable HTML5 semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions. Note that this method is EXPERIMENTAL and might change without warning!
DEBUGGING
You can set the MOJO_DOM_DEBUG
environment variable to get some advanced diagnostics information printed to STDERR
.
MOJO_DOM_DEBUG=1