NAME
HTML::DOM - A Perl implementation of the HTML Document Object Model
VERSION
Version 0.002 (alpha)
WARNING: This module is still at an experimental stage. Only a few features have been implemented so far. The API is subject to change without notice.
SYNOPSIS
use HTML::DOM;
my $dom_tree = new HTML::DOM; # empty tree
$dom_tree->parse_file($filename);
$dom_tree->getElementsByTagName('body')->[0]->appendChild(
$dom_tree->createElement('input')
);
my $text = $dom_tree->createTextNode('text');
$text->data; # get attribute
$text->data('new value'); # set attribute
# print $dom_tree->documentElement->outerHTML, "\n";
# (doesn't work yet)
DESCRIPTION
This module implements the HTML Document Object Model by extending the HTML::Tree modules. The HTML::DOM class serves both as an HTML parser and as the document class.
METHODS
Non-DOM Methods
- $tree = new HTML::DOM
-
This class method constructs and returns a new HTML::DOM object.
- $tree = new_from_file HTML::DOM
- $tree = new_from_content HTML::DOM
-
Not yet implemented.
- $tree->elem_handler($elem_name => sub { ... })
-
This method has no effect unless you call it before building the DOM tree. If you call this method, then, when the DOM tree is in the process of being built, the subroutine will be called after each
$elem_name
element is added to the tree. If you give '*' as the element name, the subroutine will be called for each element that does not have a handler. The subroutine's two arguments will be the tree itself and the element in question. The subroutine can call the DOM object'swrite
method to insert HTML code into the source after the element.Here is a lame example (which does not take Content-Script-Type headers or security into account):
$tree->elem_handler(script => sub { my($document,$elem) = @_; return unless $elem->attr('type') eq 'application/x-perl'; eval($elem->firstChild->data); }); $tree->parse( '<p>The time is <script type="application/x-perl"> $document->write(scalar localtime) </script> precisely. </p>' ); $tree->eof; print $tree->documentElement->as_text, "\n"; # as_text doesn't work yet
- $tree->parse_file($file)
- $tree->parse(...)
- $tree->eof()
-
These three methods simply call HTML::TreeBuilder's methods with the same name (q.v., and see also HTML::Element), but note that
parse_file
andeof
may only be called once for each HTML::DOM object (since it deletes its parser when it no longer needs it). Similarly,parse
may not be called aftereof
. - $tree->event_attr_handler
- $tree->default_event_handler
-
See "EVENT HANDLING", below.
DOM Methods
(to be written)
- etc. etc. etc.
- createEvent
-
This currently ignores its args. Later the arg passed to it will determine into which class the newly-created event object is blessed.
EVENT HANDLING
HTML::DOM supports both the DOM Level 2 event model and the HTML 4 event model (at least in part, so far [in particular, the Event base class is implemented, but none of its subclasses; no events are triggered automatically yet]).
An event listener (aka handler) is a coderef, an object with a handleEvent
method or an object with &{}
overloading. HTML::DOM does not implement any classes that provide a handleEvent
method, but will support any object that has one.
To specify the default actions associated with an events, provide a subroutine via the default_event_handler
method. The first argument will be the event object. For instance:
$dom_tree->default_event_handler(sub {
my($self, $event) = @_;
my $type = $event->type;
my $tag = (my $target = $event->target)->nodeName;
if ($type eq 'click' && $tag eq 'A') {
# ...
}
# etc.
});
default_event_handler
without any arguments will return the currently assigned coderef. With an argument it will return the old one after assigning the new one.
HTML::DOM::Node's dispatchEvent
method triggers the appropriate event listeners, but does not call any default actions associated with it. The return value is a boolean that indicates whether the default action should be taken.
H:D:Node's trigger_event
method will trigger the event for real. It will call dispatchEvent
and, provided it returns true, will call the default event handler.
The event_attr_handler
can be used to assign a coderef that will turn text assigned to an event attribute (e.g., onclick
) into a listener. The arguments to the routine will be (0) the element, (1) the name (aka type) of the event (without the initial 'on') and (2) the value of the attribute. As with default_event_handler
, you can replace an existing handler with a new one, in which case the old handler is returned. If you call this method without arguments, it returns the current handler. Here is an example of its use, that assumes that handlers are Perl code:
$dom_tree->event_attr_handler(sub {
my($elem, $name, $code) = @_;
my $sub = eval "sub { $code }";
return sub {
my($event) = @_;
local *_ = \$elem;
my $ret = &$code;
defined $ret and !$ret and
$event->preventDefault;
};
});
The event attribute handler will be called whenever an element attribute whose name begins with 'on' (case-tolerant) is modified.
CLASSES AND DOM INTERFACES
Here are the inheritance hierarchy of HTML::DOM's various classes and the DOM interfaces those classes implement:
Class Hierarchy Interfaces
--------------- ----------
HTML::DOM::Exception DOMException, EventException
HTML::DOM::Implementation DOMImplementation
HTML::Element
HTML::DOM::Node Node, EventTarget
HTML::DOM::DocumentFragment DocumentFragment
HTML::DOM Document, DocumentEvent
HTML::DOM::CharacterData CharacterData
HTML::DOM::Text Text
HTML::DOM::Comment Comment
HTML::DOM::Element Element
HTML::DOM::NodeList NodeList
HTML::DOM::NodeList::Magic NodeList
HTML::DOM::NamedNodeMap NamedNodeMap
HTML::DOM::Attr Node, Attr
HTML::DOM::Event Event
Later, HTML::DOM::Element will have subclasses for the various different element types.
Although HTML::DOM::Node inherits from HTML::Element, methods of HTML::Element that make a distinction between text and elements either will not work or will work slightly differently.
The EventListener interface is not implemented by HTML::DOM, but is supported. See "EVENT HANDLING", above.
IMPLEMENTATION NOTES
Node attributes are accessed via methods of the same name. When the method is invoked, the current value is returned. If an argument is supplied, the attribute is set (unless it is read-only) and its old value returned.
Where the DOM spec. says to use null, undef or an empty list is used.
Instead of UTF-16 strings, HTML::DOM uses Perl's Unicode strings (which happen to be stored as UTF-8 internally). The only significant difference this makes is to
length
,substringData
and other methods of Text and Comment nodes. These methods behave in a Perlish way (i.e., the offsets and lengths are specified in Unicode characters, not in UTF-16 bytes). The alternate methodslength16
,substringData16
et al. use UTF-16 for offsets and are standards-compliant in that regard (but the string returned bysubstringData
is still a regular Perl string).
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 513:
=over without closing =back
- Around line 543:
'=end for me' is invalid. (Stack: =over; =begin for)