NAME
LWP::Parallel::UserAgent - A class for parallel User Agents
SYNOPSIS
require LWP::Parallel::UserAgent;
$ua = LWP::Parallel::UserAgent->new();
...
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
...
$ua->register ($request); # or
$ua->register ($request, '/tmp/sss'); # or
$ua->register ($request, \&callback, 4096);
...
$ua->wait ( $timeout );
...
sub callback { my($data, $response, $protocol) = @_; .... }
DESCRIPTION
This class implements a user agent that access web sources in parallel.
Using a LWP::Parallel::UserAgent as your user agent, you typically start by registering your requests, along with how you want the Agent to process the incoming results (see $ua->register).
Then you wait for the results by calling $ua->wait. This method only returns, if all requests have returned an answer, or the Agent timed out. Also, individual callback functions might indicate that the Agent should stop waiting for requests and return. (see $ua->register)
See the file LWP::Parallel for a set of simple examples.
METHODS
The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all of its methods are available here. However, you can use its main methods, $ua->simple_request and $ua->request, in order to simulate singular access with this package. Of course, if a single request is all you need, then you should probably use LWP::UserAgent in the first place, since it will be faster than our emulation here.
For parallel access, you will need to use the new methods that come with LWP::Parallel::UserAgent, called $pua->register and $pua->wait. See below for more information on each method.
- $ua = LWP::Parallel::UserAgent->new();
-
Constructor for the parallel UserAgent. Returns a reference to a LWP::Parallel::UserAgent object.
Optionally, you can give it an existing LWP::Parallel::UserAgent (or even an LWP::UserAgent) as a first argument, and it will "clone" a new one from this (This just copies the behavior of LWP::UserAgent. I have never actually tried this, so let me know if this does not do what you want).
- $ua->initialize;
-
Takes no arguments and initializes the UserAgent. It is automatically called in LWP::Parallel::UserAgent::new, so usually there is no need to call this explicitly.
However, if you want to re-use the same UserAgent object for a number of "runs", you should call $ua->initialize after you have processed the results of the previous call to $ua->wait, but before registering any new requests.
- $ua->redirect ( $ok )
-
Changes the default value for permitting Parallel::UserAgent to follow redirects and authentication-requests. The standard value is 'true'.
See
$ua-
register> for how to change the behaviour for particular requests only. - $ua->duplicates ( $ok )
-
Changes the default value for permitting Parallel::UserAgent to ignore duplicate requests. The standard value is 'false'.
- $ua->in_order ( $ok )
-
Changes the default value to restricting Parallel::UserAgent to connect to the registered sites in the order they were registered. The default value FALSE allows Parallel::UserAgent to make the connections in an apparently random order.
- $ua->remember_failures ( $yes )
-
If set to one, enables ParalleUA to ignore requests or connections to sites that it failed to connect to before during this "run". If set to zero (the dafault) Parallel::UserAgent will try to connect to every single URL you registered, even if it constantly fails to connect to a particular site.
- $ua->max_hosts ( $max )
-
Changes the maximum number of locations accessed in parallel. The default value is 7.
Note: Although it says 'host', it really means 'netloc/server'! That is, multiple server on the same host (i.e. one server running on port 80, the other one on port 6060) will count as two 'hosts'.
- $ua->max_req ( $max )
-
Changes the maximum number of requests issued per host in parallel. The default value is 5.
- $ua->register ( $request [, $arg [, $size [, $redirect_ok]]] )
-
Registers the given request with the User Agent. In case of an error, a
HTTP::Request
object containing the HTML-Error message is returned. Otherwise (that is, in case of a success) it will return undef.The
$request
should be a reference to aHTTP::Request
object with values defined for at least the method() and url() attributes.$size
specifies the number of bytes Parallel::UserAgent should try to read each time some new data arrives. Setting it to '0' or 'undef' will make Parallel::UserAgent use the default. (8k)Specifying
$redirect_ok
will alter the redirection behaviour for this particular request only. '1' or any other true value will force Parallel::UserAgent to follow redirects, even if the default is set to 'no_redirect'. (see$ua-
redirect>) '0' or any other false value should do the reverse. Please note that POST requests are not being followed, regardless of the $redirect_ok value!If
$arg
is a scalar it is taken as a filename where the content of the response is stored.If
$arg
is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional$size
argument is taken as a hint for an appropriate chunk size. The callback function is called with 3 arguments: the data received this time, a reference to the response object and a reference to the protocol object. The callback can use the predefined constants C_ENDCON, C_LASTCON and C_ENDALL as a return value in order to influence pending and active connections. C_ENDCON will end this connection immediately, whereas C_LASTCON will inidicate that no further connections should be made. C_ENDALL will immediately end all requests and let the Parallel::UserAgent return from $pua->wait().If
$arg
is omitted, then the content is stored in the response object itself.If
$arg
is aLPW::Parallel::UserAgent::Entry
object, then this request will be registered as a follow-up request to this particular entry. This will not create a new entry, but instead link the current response (i.e. the reason for re-registering) as $response->previous to the new response of this request. All other fields are either re-initialized ($request, $fullpath, $proxy) or left untouched ($arg, $size). (This should only be use internally)LWP::Parallel::UserAgent->request also allows the registration of follow-up requests to existing requests, that required redirection or authentication. In order to do this, an Parallel::UserAgent::Entry object will be passed as the second argument to the call. Usually, this should not be used directly, but left to the internal $ua->handle_response method!
NOTE: As of ParallelUA v2.36 ftp-handling has been disabled! Apparently this never really worked properly in the first place, but no one actually used ParallelUA with ftp-requests so far :-) Thanks to Gary Foster for pointing this out. I will disable ftp access until I have figured out why it's not working! Sorry 'bout that.
- $ua->on_connect ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each connection attempted by the User Agent.
- $ua->on_failure ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each connection or registration that failed.
- $ua->on_return ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each request returned. If a callback function was registered with this request, this callback function is called before $pua->on_return.
Please note that while $pua->on_return is a method (which should be overridden in a subclass), a callback function is NOT a method, and does not have $self as its first parameter. (See more on callbacks below)
The purpose of $pua->on_return is mainly to provide messages when a request returns. However, you can also re-register follow-up requests in case you need them.
If you need specialized follow-up requests depending on the request that just returend, use a callback function instead (which can be different for each request registered). Otherwise you might end up writing a HUGE if..elsif..else.. branch in this global method.
- $us->discard_entry ( $entry )
-
Completely removes an entry from memory, in case its output is not needed. Use this in callbacks such as
on_return
or <on_failure> if you want to make sure an entry that you do not need does not occupy valuable main memory. - $ua->wait ( $timeout )
-
Waits for available sockets to write to or read from. Will timeout after $timeout seconds. Will block if $timeout = 0 specified. If $timeout is omitted, it will use the Agent default timeout value.
- $ua->handle_response($request, $arg [, $size])
-
Analyses results, handling redirects and security. This method may actually register several different, additional requests.
This method should not be called directly. Instead, indicate for each individual request registered with
$ua-
register()> whether or not you want Parallel::UserAgent to handle redirects and security, or specify a default value for all requests in Parallel::UserAgent by using$ua-
redirect()>. - $ua->simple_request($request, [$arg [, $size]])
-
This method simulates the behavior of LWP::UserAgent->simple_request. It is actually kinda overkill to use this method in Parallel::UserAgent, and it is mainly here for testing backward compatibility with the original LWP::UserAgent. The following description is taken directly from the corresponding libwww pod:
$ua->simple_request dispatches a single WWW request on behalf of a user, and returns the response received. The
$request
should be a reference to aHTTP::Request
object with values defined for at least the method() and url() attributes.If
$arg
is a scalar it is taken as a filename where the content of the response is stored.If
$arg
is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional$size
argument is taken as a hint for an appropriate chunk size.If
$arg
is omitted, then the content is stored in the response object itself. - $ua->request($request, $arg [, $size])
-
Included for compatibility testing with LWP::UserAgent. Every day usage is depreciated! Here is what LWP::UserAgent has to say about it:
Process a request, including redirects and security. This method may actually send several different simple reqeusts.
The arguments are the same as for
simple_request()
.sub request { my $self = shift;
my $ua = LWP::Parallel::UserAgent->new(); $ua->agent($self->agent); $ua->from ($self->from); $ua->redirect(1); &_single_request($ua, @_); }
- $ua->as_string
-
Returns a text that describe the state of the UA. Should be useful for debugging, if it would print out anything important. But it does not (at least not yet). Try using LWP::Debug...
ADDITIONAL METHODS
- $ua->use_alarm([$boolean])
-
This function is not in use anymore and will display a warning when called and warnings are enabled.
Callback functions
You can register a callback function. See LWP::UserAgent for details.
BUGS
Probably lots! This is only an interim release until this functionality is incorporated into LWPng, the next generation libwww module.
Needs a lot more documentation on how callbacks work!
SEE ALSO
AUTHOR
Marc Langheinrich <marclang@cs.washington.edu>
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 1386:
You forgot a '=back' before '=head1'
- Around line 1388:
'=item' outside of any '=over'
- Around line 1399:
You forgot a '=back' before '=head1'