NAME
Mojo::DOM - Minimalistic HTML5/XML DOM Parser With CSS3 Selectors
SYNOPSIS
use Mojo::DOM;
# Parse
my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>');
# Find
my $b = $dom->at('#b');
print $b->text;
# Walk
print $dom->div->p->[0]->text;
print $dom->div->p->[1]->{id};
# Iterate
$dom->find('p[id]')->each(sub { print shift->{id} });
# Loop
for my $e ($dom->find('p[id]')->each) {
print $e->text;
}
# Modify
$dom->div->p->[1]->append('<p id="c">C</p>');
# Render
print $dom;
DESCRIPTION
Mojo::DOM is a minimalistic and relaxed HTML5/XML DOM parser with CSS3 selector support. It will even try to interpret broken XML, so you should not use it for validation.
CASE SENSITIVITY
Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.
my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
print $dom->at('p')->text;
print $dom->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
print $dom->at('P')->text;
print $dom->P->{ID};
XML detection can be also deactivated with the xml
method.
# XML sematics
$dom->xml(1);
# HTML5 semantics
$dom->xml(0);
METHODS
Mojo::DOM inherits all methods from Mojo::Base and implements the following new ones.
new
my $dom = Mojo::DOM->new;
my $dom = Mojo::DOM->new(xml => 1);
my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>');
my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>', xml => 1);
Construct a new Mojo::DOM object.
all_text
my $text = $dom->all_text;
Extract all text content from DOM structure.
append
$dom = $dom->append('<p>Hi!</p>');
Append to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>');
append_content
$dom = $dom->append_content('<p>Hi!</p>');
Append to element content.
# "<div><h1>AB</h1></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B');
at
my $result = $dom->at('html title');
Find a single element with CSS3 selectors. All selectors from Mojo::DOM::CSS are supported.
attrs
my $attrs = $dom->attrs;
my $foo = $dom->attrs('foo');
$dom = $dom->attrs({foo => 'bar'});
$dom = $dom->attrs(foo => 'bar');
Element attributes.
# Direct hash access to attributes is also available
print $dom->{foo};
print $dom->div->{id};
charset
my $charset = $dom->charset;
$dom = $dom->charset('UTF-8');
Charset used for decoding and encoding HTML5/XML.
children
my $collection = $dom->children;
my $collection = $dom->children('div')
Return a Mojo::DOM::Collection object containing the children of this element, similar to find
.
# Child elements are also automatically available as object methods
print $dom->div->text;
print $dom->div->[23]->text;
$dom->div->each(sub { print $_->text });
content_xml
my $xml = $dom->content_xml;
Render content of this element to XML.
find
my $collection = $dom->find('html title');
Find elements with CSS3 selectors and return a Mojo::DOM::Collection object. All selectors from Mojo::DOM::CSS are supported.
print $dom->find('div')->[23]->text;
namespace
my $namespace = $dom->namespace;
Find element namespace.
parent
my $parent = $dom->parent;
Parent of element.
parse
$dom = $dom->parse('<foo bar="baz">test</foo>');
Parse HTML5/XML document with Mojo::DOM::HTML.
prepend
$dom = $dom->prepend('<p>Hi!</p>');
Prepend to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>');
prepend_content
$dom = $dom->prepend_content('<p>Hi!</p>');
Prepend to element content.
# "<div><h2>AB</h2></div>"
$dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A');
replace
$dom = $dom->replace('<div>test</div>');
Replace elements.
# "<div><h2>B</h2></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');
replace_content
$dom = $dom->replace_content('test');
Replace element content.
# "<div><h1>B</h1></div>"
$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B');
root
my $root = $dom->root;
Find root node.
text
my $text = $dom->text;
Extract text content from element only, not including child elements.
to_xml
my $xml = $dom->to_xml;
Render DOM to XML.
tree
my $tree = $dom->tree;
$dom = $dom->tree(['root', ['text', 'lalala']]);
Document Object Model.
type
my $type = $dom->type;
$dom = $dom->type('html');
Element type.
xml
my $xml = $dom->xml;
$dom = $dom->xml(1);
Disable HTML5 semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions. Note that this method is EXPERIMENTAL and might change without warning!