NAME

HTML::DOM - A Perl implementation of the HTML Document Object Model

VERSION

Version 0.001 (alpha)

WARNING: This module is still at an experimental stage. Only a few features have been implemented so far. The API is subject to change without notice.

SYNOPSIS

use HTML::DOM;

my $dom_tree = new HTML::DOM; # empty tree
$dom_tree->parse_file($filename);

$dom_tree->getElementsByTagName('body')->[0]->appendChild(
         $dom_tree->createElement('input')
)

# print $dom_tree->documentElement->outerHTML, "\n";
# (doesn't work yet)

DESCRIPTION

This module implements the HTML Document Object Model by extending the HTML::Tree modules. The HTML::DOM class serves both as an HTML parser and as the document class.

METHODS

$tree = new HTML::DOM

This class method constructs and returns a new HTML::DOM object.

$tree = new_from_file HTML::DOM
$tree = new_from_content HTML::DOM

Not yet implemented.

$tree->elem_handler($elem_name => sub { ... })

This method has no effect unless you call it before building the DOM tree. If you call this method, then, when the DOM tree is in the process of being built, the subroutine will be called after each $elem_name element is added to the tree. If you give '*' as the element name, the subroutine will be called for each element that does not have a handler. The subroutine's two arguments will be the tree itself and the element in question. The subroutine can call the DOM object's write method to insert HTML code into the source after the element.

Here is a lame example (which does not take Content-Script-Type headers or security into account):

$tree->elem_handler(script => sub {
    my($document,$elem) = @_;
    return unless $elem->attr('type') eq 'application/x-perl';
    eval($elem->firstChild->data);
});

$tree->parse(
    '<p>The time is
         <script type="application/x-perl">
              $document->write(scalar localtime)
         </script>
         precisely.
     </p>'
);
$tree->eof;

print $tree->documentElement->as_text, "\n";
# as_text doesn't work yet
$tree->parse_file($file)
$tree->parse(...)
$tree->eof()

These three methods simply call HTML::TreeBuilder's methods with the same name (q.v., and see also HTML::Element), but note that parse_file and eof may only be called once for each HTML::DOM object (since it deletes its parser when it no longer needs it). Similarly, parse may not be called after eof.

CLASSES AND DOM INTERFACES

Here are the inheritance hierarchy of HTML::DOM's various classes and the DOM interfaces those classes implement:

HTML::DOM::Exception                 DOMException
HTML::DOM::Implementation            DOMImplementation
HTML::Element
    HTML::DOM::Node                  Node
        HTML::DOM::DocumentFragment  DocumentFragment
        HTML::DOM                    Document
        HTML::DOM::CharacterData     CharacterData
            HTML::DOM::Text          Text
            HTML::DOM::Comment       Comment
        HTML::DOM::Element           Element
HTML::DOM::NodeList                  NodeList
HTML::DOM::NodeList::Magic           NodeList
HTML::DOM::NamedNodeMap              NamedNodeMap
HTML::DOM::Attr                      Node, Attr

Later, HTML::DOM::Element will have subclasses for the various different element types.

Although HTML::DOM::Node inherits from HTML::Element, methods of HTML::Element that make a distinction between text and elements either will not work or will work slightly differently.

IMPLEMENTATION NOTES

  • Node attributes are accessed via methods of the same name. When the method is invoked, the current value is returned. If an argument is supplied, the attribute is set (unless it is read-only) and its old value returned.

  • Where the DOM spec. says to use null, undef or an empty list is used.

  • Instead of UTF-16 strings, HTML::DOM uses Perl's Unicode strings (which happen to be stored as UTF-8 internally). The only significant difference this makes is to length, substringData and other methods of Text and Comment nodes. These methods behave in a Perlish way (i.e., the offsets and lengths are specified in Unicode characters, not in UTF-16 bytes). The alternate methods length16, substringData16 et al. use UTF-16 for offsets and are standards-compliant in that regard (but the string returned by substringData is still a regular Perl string).

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 387:

=over without closing =back

Around line 417:

'=end for me' is invalid. (Stack: =over; =begin for)