NAME

Mojo::DOM - Minimalistic XML/HTML5 DOM Parser With CSS3 Selectors

SYNOPSIS

use Mojo::DOM;

# Parse
my $dom = Mojo::DOM->new;
$dom->parse('<div><div id="a">A</div><div id="b">B</div></div>');

# Find
my $b = $dom->at('#b');
print $b->text;

# Iterate
$dom->find('div[id]')->each(sub { print shift->text });

# Loop
for my $e ($dom->find('div[id]')->each) {
  print $e->text;
}

# Get the first 10 links
$dom->find('a[href]')
  ->while(sub { print shift->attrs->{href} && pop() < 10 });

# Search for a link about a specific topic
$dom->find('a[href]')
  ->until(sub { $_->text =~ m/kraih/ && print $_->attrs->{href} });

DESCRIPTION

Mojo::DOM is a minimalistic and very relaxed XML/HTML5 DOM parser with support for CSS3 selectors. It will even try to interpret broken XML, so you should not use it for validation.

SELECTORS

All CSS3 selectors that make sense for a standalone parser are supported.

`*`

Any element.

my $first = $dom->at('*');

`E`

An element of type E.

my $title = $dom->at('title');

`E[foo]`

An E element with a foo attribute.

my $links = $dom->find('a[href]');

`E[foo="bar"]`

An E element whose foo attribute value is exactly equal to bar.

my $fields = $dom->find('input[name="foo"]');

`E[foo~="bar"]`

An E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar.

my $fields = $dom->find('input[name~="foo"]');

`E[foo^="bar"]`

An E element whose foo attribute value begins exactly with the string bar.

my $fields = $dom->find('input[name^="f"]');

`E[foo$="bar"]`

An E element whose foo attribute value ends exactly with the string bar.

my $fields = $dom->find('input[name$="o"]');

`E[foo*="bar"]`

An E element whose foo attribute value contains the substring bar.

my $fields = $dom->find('input[name*="fo"]');

`E:root`

An E element, root of the document.

my $root = $dom->at(':root');

`E:checked`

A user interface element E which is checked (for instance a radio-button or checkbox).

my $input = $dom->at(':checked');

`E:empty`

An E element that has no children (including text nodes).

my $empty = $dom->find(':empty');

`E:nth-child(n)`

An E element, the n-th child of its parent.

my $third = $dom->at('div:nth-child(3)');
my $odd   = $dom->find('div:nth-child(odd)');
my $even  = $dom->find('div:nth-child(even)');
my $top3  = $dom->find('div:nth-child(-n+3)');

`E:nth-last-child(n)`

An E element, the n-th child of its parent, counting from the last one.

my $third    = $dom->at('div:nth-last-child(3)');
my $odd      = $dom->find('div:nth-last-child(odd)');
my $even     = $dom->find('div:nth-last-child(even)');
my $bottom3  = $dom->find('div:nth-last-child(-n+3)');

`E:nth-of-type(n)`

An E element, the n-th sibling of its type.

my $third = $dom->at('div:nth-of-type(3)');
my $odd   = $dom->find('div:nth-of-type(odd)');
my $even  = $dom->find('div:nth-of-type(even)');
my $top3  = $dom->find('div:nth-of-type(-n+3)');

`E:nth-last-of-type(n)`

An E element, the n-th sibling of its type, counting from the last one.

my $third    = $dom->at('div:nth-last-of-type(3)');
my $odd      = $dom->find('div:nth-last-of-type(odd)');
my $even     = $dom->find('div:nth-last-of-type(even)');
my $bottom3  = $dom->find('div:nth-last-of-type(-n+3)');

`E:first-child`

An E element, first child of its parent.

my $first = $dom->at('div p:first-child');

`E:last-child`

An E element, last child of its parent.

my $last = $dom->at('div p:last-child');

`E:first-of-type`

An E element, first sibling of its type.

my $first = $dom->at('div p:first-of-type');

`E:last-of-type`

An E element, last sibling of its type.

my $last = $dom->at('div p:last-of-type');

`E:only-child`

An E element, only child of its parent.

my $lonely = $dom->at('div p:only-child');

`E:only-of-type`

an E element, only sibling of its type.

my $lonely = $dom->at('div p:only-of-type');

`E:not(s)`

An E element that does not match simple selector s.

my $others = $dom->at('div p:not(:first-child)');

`E F`

An F element descendant of an E element.

my $headlines = $dom->find('div h1');

`E > F`

An F element child of an E element.

my $headlines = $dom->find('html > body > div > h1');

`E + F`

An F element immediately preceded by an E element.

my $second = $dom->find('h1 + h2');

`E ~ F`

An F element preceded by an E element.

my $second = $dom->find('h1 ~ h2');

`E, F, G`

Elements of type E, F and G.

my $headlines = $dom->find('h1, h2, h3');

`E[foo=bar][bar=baz]`

An E element whose attributes match all following attribute selectors.

my $links = $dom->find('a[foo^="b"][foo$="ar"]');

ATTRIBUTES

Mojo::DOM implements the following attributes.

`charset`

my $charset = $dom->charset;
$dom        = $dom->charset('UTF-8');

Charset used for decoding and encoding XML.

`tree`

my $array = $dom->tree;
$dom      = $dom->tree(['root', ['text', 'lalala']]);

Document Object Model.

METHODS

Mojo::DOM inherits all methods from Mojo::Base and implements the following new ones.

`add_after`

$dom = $dom->add_after('<p>Hi!</p>');

Add after element.

$dom->parse('<div><h1>A</h1></div>')->at('h1')->add_after('<h2>B</h2>');

`add_before`

$dom = $dom->add_before('<p>Hi!</p>');

Add before element.

$dom->parse('<div><h2>A</h2></div>')->at('h2')->add_before('<h1>B</h1>');

`all_text`

my $text = $dom->all_text;

Extract all text content from DOM structure.

`at`

my $result = $dom->at('html title');

Find a single element with CSS3 selectors.

`attrs`

my $attrs = $dom->attrs;
my $foo   = $dom->attrs('foo');
$dom      = $dom->attrs({foo => 'bar'});
$dom      = $dom->attrs(foo => 'bar');

Element attributes.

`children`

my $children = $dom->children;

Children of element.

`find`

my $collection = $dom->find('html title');

Find elements with CSS3 selectors and return a collection.

print $dom->find('div')->[23]->text;
$dom->find('div')->each(sub { print shift->text });
$dom->find('div')->while(sub { print $_->text && $_->text =~ /foo/ });
$dom->find('div')->until(sub { $_->text =~ /foo/ && print $_->text });

`inner_xml`

my $xml = $dom->inner_xml;

Render content of this element to XML.

`namespace`

my $namespace = $dom->namespace;

Element namespace.

`parent`

my $parent = $dom->parent;

Parent of element.

`parse`

$dom = $dom->parse('<foo bar="baz">test</foo>');

Parse XML document.

`replace`

$dom = $dom->replace('<div>test</div>');

Replace elements.

$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>');

`replace_inner`

$dom = $dom->replace_inner('test');

Replace element content.

$dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_inner('B');

`root`

my $root = $dom->root;

Find root element.

`text`

my $text = $dom->text;

Extract text content from element only, not including child elements.

`to_xml`

my $xml = $dom->to_xml;

Render DOM to XML.

`type`

my $type = $dom->type;
$dom     = $dom->type('html');

Element type.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

SELECTORS

*

E

E[foo]

E[foo="bar"]

E[foo~="bar"]

E[foo^="bar"]

E[foo$="bar"]

E[foo*="bar"]

E:root

E:checked

E:empty

E:nth-child(n)

E:nth-last-child(n)

E:nth-of-type(n)

E:nth-last-of-type(n)

E:first-child

E:last-child

E:first-of-type

E:last-of-type

E:only-child

E:only-of-type

E:not(s)

E F

E > F

E + F

E ~ F

E, F, G

E[foo=bar][bar=baz]