NAME
WWW::Scripter - For scripting web sites that have scripts
VERSION
0.006 (alpha)
SYNOPSIS
use WWW::Scripter;
$w = new WWW::Scripter;
$w->use_plugin('Ajax'); # packaged separately
$w->get('http://some.site.com/that/relies/on/ajax');
$w->eval(' alert("Hello from JavaScript") ');
$w->document->getElementsByTagName('div')->[0]->....
$w->content; # returns the HTML content, possibly modified
# by scripts
DESCRIPTION
This is a subclass of WWW::Mechanize that uses the W3C DOM and provides support for scripting.
No actual scripting engines are provided with WWW::Scripter, but are available as separate plugins. (See also the "SEE ALSO" section below.)
INTERFACE
See WWW::Mechanize for a vast list of methods that this module inherits.
In addition to those, this module implements the well-known Window interface, providing also a few routines for attaching scripting engines and what-not.
In the descriptions below, $w
refers to the WWW::Scripter object. You can think of it as short for either 'WWW::Scripter' or 'window'.
Constructor
my $w = new WWW::Scripter %args
The constructor accepts named arguments. There are only two that WWW::Scripter itself deals with directly. The rest are passed on to the superclass. See WWW::Mechanize and LWP::UserAgent for details on what other arguments the constructor accepts.
The two arguments are:
- max_docs
-
The maximum number of document objects to keep in history (along with their corresponding request and response objects). If this is omitted, Mech's
stack_depth
+ 1 will be used. This is off by one becausestack_depth
is the number of pages you can go back to, so it is one less than the number of recorded pages.max_docs
considers 0 to be equivalent to infinity. - max_history
-
If the number of items in history exceeds
max_docs
, WWW::Scripter will still keep the request objects (so you can go back more thanmax_docs
times and previously visited pages will reload).max_history
restricts the total number of items in history (whether full document objects or just requests). 0 is equivalent to infinity.
The Window Interface
In addition to the methods listed here, see also HTML::DOM::View and HTML::DOM::EventTarget. (WWW::Scripter does not itself inherit from EventTarget, but provides the same interface, and also a DOES
method that returns true for 'HTML::DOM::EventTarget'.)
- location
-
Returns the location object (see WWW::Scripter::Location). If you pass an argument, it sets the
href
attribute of the location object. - alert
- confirm
- prompt
-
Each of these calls the function assigned by one of the
set_*
methods below under "Window-Related Methods". -
Returns the navigator object. This currently has three properties,
appName
(set toref $w
)appVersion
(ref($w)->VERSION
) anduserAgent
(same as$w->agent
).You can pass values to
appName
andappVersion
to set them. - setTimeout ( $code, $ms );
-
This schedules the
$code
to run after$ms
seconds have elapsed, returning a number uniquely identifying the time-out. - clearTimeout ( $timeout_id )
-
The cancels the time-out corresponding to the
$timeout_id
. - open ( $url )
-
This is a temporary placeholder. Right now it ignores all its args except the first, and goes to the given URL, such that
->open(foo)
is equivalent to->location('foo')
. - history
-
Returns the history object. See WWW::Scripter::History.
- window
- self
-
These two return the window object itself.
- frames
-
Although the W3C DOM specifies that this return
$w
(the window itself), for efficiency's sake this returns a separate object which one can use as a hash or array reference to access its sub-frames. (The window object itself cannot be used that way.) The frames object (class WWW::Scripter::Frames) also has awindow
method that returns$w
.In list context a list of frames is returned.
- length
-
Returns the number of frames.
$w->length
is equivalent toscalar @{$w->frames}
. - top
-
Returns the 'top' window, which is the window itself if there are no frames.
- parent
-
Returns the parent frame, if there is one, or the window object itself otherwise.
Window-Related Methods
These methods are not part of the Window interface, but are closely related to the object's window behaviour.
- set_alert_function
- set_confirm_function
- set_prompt_function
-
Use these to set the functions called by the above methods. There are no default
confirm
andprompt
functions. The defaultalert
prints to the currently selected file handle, with a line break tacked on the end. - check_timers
-
This evaluates the code associated with each timeout registered with the
setTimeout
method, if the appropriate interval has elapsed. - count_timers
-
This returns the number of timers currently registered.
Methods for Plugins, Scripting, etc.
- eval ( $code [, $scripting_language] )
-
Evaluates the
$code
passed to it. This method dies if there is no script handler registered for the$scripting_language
. - use_plugin ( $plugin_name [, @options] )
-
This will automatically
require()
the plugin for you, and then initialise it. To pass extra options to the plugin after loading it, just use the same syntax again. This will return the plugin object if the plugin has one. - plugin ( $plugin_name )
-
This will return the plugin object, if it has one. Some plugins may provide this as a way to communicate directly with the plugin.
You can also use the return value as a boolean, to see whether a plugin is loaded.
- scripts_enabled ( $new_val )
-
This returns a boolean indicating whether scripts are enabled. It is true by default. You can disable scripts by passing a false value. When you disable scripts, event handlers are also disabled, as is the registration of event handlers by HTML event attributes.
- script_handler ( $language_re, $object )
-
A script handler is a special object that knows how to run scripts in a particular language. Use this method to register such an object.
$language_re
is a regular expression that will be matched against a scripting language name (from a 'language' HTML attribute) or MIME type (<script type=...). You can also use the special value 'default'.$object
is the script handler object. For its interface, see "SCRIPT HANDLERS", below. - class_info ( \%interfaces )
-
With this you can provide information for Perl classes to scripting languages, so that scripts can handle objects of those classes.
You should pass a hash ref that has the structure described in HTML::DOM::Interface, except that this method also accepts a
_constructor
hash element, which should be set to the name of the method to be called when the constructor function is called from the scripting language (e.g.,_constructor => 'new'
) or a subroutine reference.The return value is a list of all hashrefs passed to
class_info
so far plus a few that WWW::Scripter has by default (to support the DOM). You can call it without any arguments just to get that list.
Other Methods
- forward
-
The equivalent of hitting the 'forward' button in a browser. This, of course, only works after
back
. - clear_history ( $including_current_page )
-
This clears the history, preventing
back
from working until after the next request, and freeing up some memory. If supplied with a true argument, it also clears the current page. It returns$w
. - max_history
- max_history ( $new_value )
- max_docs
- max_docs ( $new_value )
-
These two return what was passed to the constructor, optionally setting it.
=back
CAVEATS
WWW::Scripter does not implement any event loop, so you have to call check_timers
yourself to trigger any timeouts. After fetching a page, you could do something like this:
sleep 1, $w->check_timers while $w->count_timers;
but beware that this may cause an infinite loop if a timeout sets another timeout. It may also cause problems with future versions of WWW::Scripter that support setInterval
. You basically have to know what works with the pages you are browsing.
THE %WindowInterface
HASH
The hash named %WWW::Scripter::WindowInterface
lists the interface members for the window object. It follows the same format as hashes within %HTML::DOM::Interface, like this:
(
alert => VOID|METHOD,
confirm => BOOL|METHOD,
...
)
It only includes those methods listed above under "The Window Interface".
SCRIPT HANDLERS
This section is only of interest to those implementing scripting engines. If you are not writing one, skip this section (or just read it anyway).
A script handler object must provide the following methods:
- eval ( $w, $code, $url, $line, $is_inline )
-
(where
$w
is the WWW::Scripter object)This is supposed to run the
$code
passed to it. - event2sub ( $w, $elem, $event_name, $code, $url, $line )
-
This is called for each HTML event attribute (onclick, etc.). It should return a coderef that runs the
$code
.
WRITING PLUGINS
Plugins are usually under the WWW::Scripter::Plugin:: namespace. If a plugin name has a hyphen (-) in it, the module name will contain a double colon (::). If, when you pass a plugin name to use_plugin
or plugin
, it has a double colon in its name, it will be treated as a fully-qualified module name (possibly) outside the usual plugin namespace. Here are some examples:
Plugin Name Module Name
----------- -----------
Chef WWW::Scripter::Plugin::Chef
Man-Page WWW::Scripter::Plugin::Man::Page
My::Odd::Plugin My::Odd::Plugin
This module will need to have an init
method, and possibly two more named options
and clone
, respectively:
- init
-
init
will be called as a class method the first timeuse_plugin
is called for a particular plugin. The second argument ($_[1]
) will be the WWW::Scripter object. The third argument will be an array ref of options (see "options", below).It may return an object if the plugin has one.
- options
-
When
$w->use_plugin
is called, if there are any arguments after the plugin name, then the plugin object'soptions
method will be called with the options themselves as the arguments.If a plugin does not provide an object, an error will be thrown if options are passed to
use_plugin
.The
init
method can override this, however. When it is called, its third argument is a reference to an array containing the options passed touse_plugin
. The contents of that same array will be used whenoptions
is called, soinit
can modify it and even preventoptions
from being called altogether, by emptying the array. - clone
-
When the WWW::Scripter object is cloned (via the
clone
method), every plugin that has a clone method (as determined by->can('clone')
), will also be cloned. The new clone of the WWW::Scripter object is passed as its argument.
If the plugin needs to record data pertinent to the current page, it can do so by associating them with the document or the request via a field hash. See Hash::Util::FieldHash and Hash::Util::FieldHash::Compat.
Handlers
See LWP's Handlers feature.
From within LWP's request_*
and response_*
handlers, you can call WWW::Scripter::abort
to abort the request and prevent a new entry from being created in browser history. (The JavaScript plugin does this with javascript: URLs.)
WWW::Scripter will export this function upon request:
use WWW::Scripter qw[ abort ];
or you can call it with a fully qualified name:
WWW::Scripter::abort();
BUGS
This is still an unfinished work. There are probably scores of bugs crawling all over the place. Here are some that are known (apart from the fact that so many features are still missing):
There is no support for XHTML, but HTML::Parser can handle most XHTML pages anyway, so maybe this is not a problem.
There is nothing to prevent infinite recursion when frames have circular references.
PREREQUISITES
perl 5.8.3 or higher (5.8.4 or higher recommended)
HTML::DOM 0.030 or higher
WWW::Mechanize 1.2 or higher
AUTHOR & COPYRIGHT
Copyright (C) 2009, Father Chrysostomos (sprout at, um, cpan dot org)
This program is free software; you may redistribute or modify it (or both) under the same terms as perl.
CONFESSION
Some of the code in here was stolen from the immediate superclass, WWW::Mechanize, as were some of the tests and test data.
SEE ALSO
WWW::Scripter sub-modules: ::Location and ::History.
See WWW::Mechanize, of which this is a subclass.
See also the following plugins:
And, if you are curious, have a look at the plugin version of WWW::Mechanize and WWW::Mechanize::Plugin::DOM (experimental and now deprecated) that this was originally based on: http://www-mechanize.googlecode.com/svn/wm/branches/plugins/
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 283:
You forgot a '=back' before '=head1'
- Around line 330:
You forgot a '=back' before '=head1'