NAME
WWW::Mechanize::Plugin::DOM - HTML Document Object Model plugin for Mech
VERSION
0.012 (alpha)
SYNOPSIS
use WWW::Mechanize;
my $m = new WWW::Mechanize;
$m->use_plugin('DOM',
script_handlers => {
default => \&script_handler,
qr/(?:^|\/)(?:x-)?javascript/ => \&script_handler,
},
event_attr_handlers => {
default => \&event_attr_handler,
qr/(?:^|\/)(?:x-)?javascript/ => \&event_attr_handler,
},
);
sub script_handler {
my($mech, $dom_tree, $code, $url, $line, $is_inline) = @_;
# ... code to run the script ...
}
sub event_attr_handler {
my($mech, $elem, $event_name, $code, $url, $line) = @_;
# ... code that returns a coderef ...
}
$m->plugin('DOM')->tree; # DOM tree for the current page
$m->plugin('DOM')->window; # Window object
DESCRIPTION
This is a plugin for WWW::Mechanize that provides support for the HTML Document Object Model. This is a part of the WWW::Mechanize::Plugin::JavaScript distribution, but it can be used on its own.
USAGE
To enable this plugin, use Mech's use_plugin
method, as shown in the synopsis.
To access the DOM tree, use $mech->plugin('DOM')->tree
, which returns an HTML::DOM object.
You may provide a subroutine that runs an inline script like this:
$mech->use_plugin('DOM',
script_handlers => {
qr/.../ => sub { ... },
qr/.../ => sub { ... },
# etc
}
);
And a subroutine for turning HTML event attributes into subroutines, like this:
$mech->use_plugin('DOM',
event_attr_handlers => {
qr/.../ => sub { ... },
qr/.../ => sub { ... },
# etc
}
);
In both cases, the qr/.../
should be a regular expression that matches the scripting language to which the handler applies, or the string 'default'. The scripting language will be either a MIME type or the contents of the language
attribute if a script element's type
attribute is not present. The subroutine specified as the 'default' will be used if there is no handler for the scripting language in question or if there is no Content-Script-Type header and, for script_handlers
, the script element has no 'type' or 'language' attribute.
Each time you move to another page with WWW::Mechanize, a different copy of the DOM plugin object is created. So, if you must refer to it in a callback routine, don't use a closure, but get it from the $mech
object that is passed as the first argument.
METHODS
This is the usual boring list of methods. Those that are described above are listed here without descriptions.
- window
-
This returns the window object.
- tree
-
This returns the DOM tree (aka the document object).
- check_timers
-
This evaluates the code associated with each timeout registered with the window's
setTimeout
function, if the appropriate interval has elapsed. - count_timers
-
This returns the number of timers currently registered.
- scripts_enabled ( $new_val )
-
This returns a boolean indicating whether scripts are enabled. It is true by default. You can disable scripts by passing a false value. When you disable scripts, event handlers are also disabled, as is the registration of event handlers by HTML event attributes.
THE 'LOAD' EVENT
Currently the (on)load event is triggered when the page finishes parsing. This plugin assumes that you're not going to be loading any images, etc.
THE %Interface
HASH
If you are creating your own script binding, you'll probably want to access the hash named %WWW::Mechanize::Plugin::DOM::Interface
, which lists, in a machine-readable format, the interface members of the location and navigator objects. It follows the same format as %HTML::DOM::Interface.
See also "THE %Interface
HASH" in WWW::Mechanize::Plugin::DOM::Window for a list of members of the window object.
PREREQUISITES
HTML::DOM 0.021 or later
The current stable release of WWW::Mechanize does not support plugins. See WWW::Mechanize::Plugin::JavaScript for more info.
BUGS
The onunload event is not yet supported.
The location object's
replace
method does not currently work correctly if the current page is the first page. In that case it acts like an assignment tohref
.The window object's
document
property does not currently get updated when you go back.It does not hook into WWW::Mechanize's
follow_link
feature to run event handlers.There is no support for XHTML.
The 'about:blank' URL is not yet supported.
If you try to get any of the attributes of the location object (or stringify the loc object) when no browsing has happened yet, you'll get an error. (This should return 'about:blank'.)
Fetching a URL that differs from the current page's only by the fragment currently creates a brand new DOM object and scripting environment.
There is nothing to prevent infinite recursion when frames have circular references.
AUTHOR & COPYRIGHT
Copyright (C) 2007-8 Father Chrysostomos <join '@', sprout => join '.', reverse org => 'cpan'
>
This program is free software; you may redistribute it and/or modify it under the same terms as perl.
SEE ALSO
WWW::Mechanize::Plugin::DOM::Window
WWW::Mechanize::Plugin::DOM::Location
WWW::Mechanize::Plugin::JavaScript
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 494:
'=item' outside of any '=over'
- Around line 519:
You forgot a '=back' before '=head1'