NAME
Mojo::HTML - Minimalistic HTML5/XML Parser
SYNOPSIS
use Mojo::HTML;
# Parse
my $html = Mojo::HTML->new('<div><p id="a">A</p><p id="b">B</p></div>');
# Walk
print $html->div->p->[0]->text;
print $html->div->p->[1]->{id};
# Modify
$html->div->p->[1]->append('<p id="c">C</p>');
# Render
print $html;
DESCRIPTION
Mojo::HTML is a minimalistic and very relaxed HTML5/XML parser and the foundation of Mojo::DOM. It will even try to interpret broken XML, so you should not use it for validation.
CASE SENSITIVITY
Mojo::HTML defaults to HTML5 semantics, that means all tags and attributes are lowercased.
my $html = Mojo::HTML->new('<P ID="greeting">Hi</P>');
print $html->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
my $html = Mojo::HTML->new('<?xml version="1.0"?><P ID="greeting">Hi</P>');
print $html->P->{ID};
XML detection can be also deactivated with the xml method.
# XML sematics
$html->xml(1);
# HTML5 semantics
$html->xml(0);
METHODS
Mojo::HTML inherits all methods from Mojo::Base and implements the following new ones.
new
my $html = Mojo::HTML->new;
my $html = Mojo::HTML->new(xml => 1);
my $html = Mojo::HTML->new('<foo bar="baz">test</foo>');
my $html = Mojo::HTML->new('<foo bar="baz">test</foo>', xml => 1);
Construct a new Mojo::HTML object.
all_text
my $text = $html->all_text;
Extract all text content from element and child eleemnts.
append
$html = $html->append('<p>Hi!</p>');
Append to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$html->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>');
append_content
$html = $html->append_content('<p>Hi!</p>');
Append to element content.
# "<div><h1>AB</h1></div>"
$html->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B');
attrs
my $attrs = $html->attrs;
my $foo = $html->attrs('foo');
$html = $html->attrs({foo => 'bar'});
$html = $html->attrs(foo => 'bar');
Element attributes.
# Direct hash access to attributes is also available
print $html->{foo};
print $html->div->{id};
charset
my $charset = $html->charset;
$html = $html->charset('UTF-8');
Charset used for decoding and encoding XML.
children
my $children = $html->children;
my $children = $html->children('div')
Return an array containing the child elements of this element.
# Child elements are also automatically available as object methods
print $html->div->text;
print $html->div->[23]->text;
content_xml
my $xml = $html->content_xml;
Render content of this element to XML.
namespace
my $namespace = $html->namespace;
Find element namespace.
parent
my $parent = $html->parent;
Parent of element.
parse
$html = $html->parse('<foo bar="baz">test</foo>');
Parse XML document.
prepend
$html = $html->prepend('<p>Hi!</p>');
Prepend to element.
# "<div><h1>A</h1><h2>B</h2></div>"
$html->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>');
prepend_content
$html = $html->prepend_content('<p>Hi!</p>');
Prepend to element content.
# "<div><h2>AB</h2></div>"
$html->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A');
replace
$html = $html->replace('<div>test</div>');
Replace elements.
# "<div><h2>B</h2></div>"
$html->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');
replace_content
$html = $html->replace_content('test');
Replace element content.
# "<div><h1>B</h1></div>"
$html->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B');
root
my $root = $html->root;
Find root element.
text
my $text = $html->text;
Extract text content from element only, not including child elements.
to_xml
my $xml = $html->to_xml;
Render element and child elements to XML.
tree
my $tree = $html->tree;
$html = $html->tree(['root', ['text', 'lalala']]);
HTML5/XML tree.
type
my $type = $html->type;
$html = $html->type('title');
Element type.
xml
my $xml = $html->xml;
$html = $html->xml(1);
Disable HTML5 semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions. Note that this method is EXPERIMENTAL and might change without warning!