NAME

Mojo::DOM - Minimalistic HTML5/XML DOM Parser With CSS3 Selectors

SYNOPSIS

use Mojo::DOM;

# Parse
my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>');

# Find
my $b = $dom->at('#b');
print $b->text;

# Walk
print $dom->div->p->[0]->text;
print $dom->div->p->[1]->{id};

# Iterate
$dom->find('p[id]')->each(sub { print shift->{id} });

# Loop
for my $e ($dom->find('p[id]')->each) {
  print $e->text;
}

# Modify
$dom->div->p->[1]->append('<p id="c">C</p>');

# Render
print $dom;

DESCRIPTION

Mojo::DOM is a minimalistic and very relaxed HTML5/XML DOM parser with support for CSS3 selectors. It will even try to interpret broken XML, so you should not use it for validation.

SELECTORS

All CSS3 selectors that make sense for a standalone parser are supported.

*

Any element.

my $first = $dom->at('*');

E

An element of type E.

my $title = $dom->at('title');

E[foo]

An E element with a foo attribute.

my $links = $dom->find('a[href]');

E[foo="bar"]

An E element whose foo attribute value is exactly equal to bar.

my $fields = $dom->find('input[name="foo"]');

E[foo~="bar"]

An E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar.

my $fields = $dom->find('input[name~="foo"]');

E[foo^="bar"]

An E element whose foo attribute value begins exactly with the string bar.

my $fields = $dom->find('input[name^="f"]');

E[foo$="bar"]

An E element whose foo attribute value ends exactly with the string bar.

my $fields = $dom->find('input[name$="o"]');

E[foo*="bar"]

An E element whose foo attribute value contains the substring bar.

my $fields = $dom->find('input[name*="fo"]');

E:root

An E element, root of the document.

my $root = $dom->at(':root');

E:checked

A user interface element E which is checked (for instance a radio-button or checkbox).

my $input = $dom->at(':checked');

E:empty

An E element that has no children (including text nodes).

my $empty = $dom->find(':empty');

E:nth-child(n)

An E element, the n-th child of its parent.

my $third = $dom->at('div:nth-child(3)');
my $odd   = $dom->find('div:nth-child(odd)');
my $even  = $dom->find('div:nth-child(even)');
my $top3  = $dom->find('div:nth-child(-n+3)');

E:nth-last-child(n)

An E element, the n-th child of its parent, counting from the last one.

my $third    = $dom->at('div:nth-last-child(3)');
my $odd      = $dom->find('div:nth-last-child(odd)');
my $even     = $dom->find('div:nth-last-child(even)');
my $bottom3  = $dom->find('div:nth-last-child(-n+3)');

E:nth-of-type(n)

An E element, the n-th sibling of its type.

my $third = $dom->at('div:nth-of-type(3)');
my $odd   = $dom->find('div:nth-of-type(odd)');
my $even  = $dom->find('div:nth-of-type(even)');
my $top3  = $dom->find('div:nth-of-type(-n+3)');

E:nth-last-of-type(n)

An E element, the n-th sibling of its type, counting from the last one.

my $third    = $dom->at('div:nth-last-of-type(3)');
my $odd      = $dom->find('div:nth-last-of-type(odd)');
my $even     = $dom->find('div:nth-last-of-type(even)');
my $bottom3  = $dom->find('div:nth-last-of-type(-n+3)');

E:first-child

An E element, first child of its parent.

my $first = $dom->at('div p:first-child');

E:last-child

An E element, last child of its parent.

my $last = $dom->at('div p:last-child');

E:first-of-type

An E element, first sibling of its type.

my $first = $dom->at('div p:first-of-type');

E:last-of-type

An E element, last sibling of its type.

my $last = $dom->at('div p:last-of-type');

E:only-child

An E element, only child of its parent.

my $lonely = $dom->at('div p:only-child');

E:only-of-type

An E element, only sibling of its type.

my $lonely = $dom->at('div p:only-of-type');

E.warning

my $warning = $dom->at('div.warning');

An E element whose class is "warning".

E#myid

my $foo = $dom->at('div#foo');

An E element with ID equal to "myid".

E:not(s)

An E element that does not match simple selector s.

my $others = $dom->at('div p:not(:first-child)');

E F

An F element descendant of an E element.

my $headlines = $dom->find('div h1');

E > F

An F element child of an E element.

my $headlines = $dom->find('html > body > div > h1');

E + F

An F element immediately preceded by an E element.

my $second = $dom->find('h1 + h2');

E ~ F

An F element preceded by an E element.

my $second = $dom->find('h1 ~ h2');

E, F, G

Elements of type E, F and G.

my $headlines = $dom->find('h1, h2, h3');

E[foo=bar][bar=baz]

An E element whose attributes match all following attribute selectors.

my $links = $dom->find('a[foo^="b"][foo$="ar"]');

CASE SENSITIVITY

Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.

my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>');
print $dom->at('p')->text;
print $dom->p->{id};

If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.

my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>');
print $dom->at('P')->text;
print $dom->P->{ID};

XML detection can be also deactivated with the xml method.

# XML sematics
$dom->xml(1);

# HTML5 semantics
$dom->xml(0);

METHODS

Mojo::DOM inherits all methods from Mojo::HTML and implements the following new ones.

at

my $result = $dom->at('html title');

Find a single element with CSS3 selectors.

children

my $collection = $dom->children;
my $collection = $dom->children('div')

Return a collection containing the child elements of this element, similar to find.

# Child elements are also automatically available as object methods
print $dom->div->text;
print $dom->div->[23]->text;
$dom->div->each(sub { print $_->text });

find

my $collection = $dom->find('html title');

Find elements with CSS3 selectors and return a collection.

print $dom->find('div')->[23]->text;

Collections are blessed arrays supporting these methods.

each
my @elements = $dom->find('div')->each;
$dom         = $dom->find('div')->each(sub { print shift->text });
$dom         = $dom->find('div')->each(sub {
  my ($e, $count) = @_;
  print "$count: ", $e->text;
});

Iterate over whole collection.

to_xml
my $xml = $dom->find('div')->to_xml;

Render collection to XML. Note that this method is EXPERIMENTAL and might change without warning!

until
$dom = $dom->find('div')->until(sub { $_->text =~ /x/ && print $_->text });
$dom = $dom->find('div')->until(sub {
  my ($e, $count) = @_;
  $e->text =~ /x/ && print "$count: ", $e->text;
});

Iterate over collection until closure returns true.

while
$dom = $dom->find('div')->while(sub {
  print($_->text) && $_->text =~ /x/
});
$dom = $dom->find('div')->while(sub {
  my ($e, $count) = @_;
  print("$count: ", $e->text) && $e->text =~ /x/;
});

Iterate over collection while closure returns true.

SEE ALSO

Mojolicious, Mojolicious::Guides, http://mojolicio.us.