The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTTP::ProxySelector::Persistent - Locally cache and use a list of proxy servers for high volume, proxied LWP::UserAgent transactions

VERSION

Version 0.01

SYNOPSIS

This module is a fork from HTTP::ProxySelector (written by Eyal Udassin) that is modified to:

  • Require less trips to look up proxy lists by caching them locally (bandwidth economy and speed).

  • Almost always set your useragent to a valid proxy (reliability).

  • Ensure that you never retry a failed proxy in a subsequent proxy selection call (minimum # of timeouts possible).

  • Leave the cache of proxy servers in place after execution for the next call to use (persistence).

  use HTTP::ProxySelector;
  use LWP::UserAgent;

  # Instantiate
  my $selector = HTTP::ProxySelector::Persistent->new();
  my $ua = LWP::UserAgent->new();

  # Assign a _ proxy to the UserAgent object.
  $selector->set_proxy($ua);
  
  # Just in case you need to know the chosen proxy
  print 'Selected proxy: ',$selector->get_proxy(),"\n";

PREREQUISITES

HTTP::ProxySelector::Persistent requires you to have these perl modules installed:

  • BerkeleyDB

  • LWP::UserAgent

  • Date::Manip

PUBLIC METHODS

new()

new is the constructor for HTTP::ProxySelector::Persistent objects.

Returns a new HTTP::ProxySelector::Persistent object.

Accepts a key-value list of options as arguments. The option keys are:

db_file

The full path and filename for the proxy cache database file. You must have permission to write in this directory. This option is mandatory, it has no default!

Example :

  $select = HTTP::ProxySelector::Persistent->new( db_file => "/tmp/proxy_cache.bdb" );

sites

Reference to a list of sites containing the proxy lists.

Example :

  $select = HTTP::ProxySelector::Persistent->new( sites => ['http://www.proxylist.com/list.htm'] );

Default:

  [ 'http://www.multiproxy.org/txt_anon/proxy.txt', 'http://www.samair.ru/proxy/fresh-proxy-list.htm' ]

update_interval

How often to update the cached list of proxy servers. Must be readable by Date::Manip.

Example:

  $select = HTTP::ProxySelector::Persistent->new( update_interval => "20 minutes" );

Default

  "15 minutes"

testsite

Destination site to test the proxy with.

Example :

  $select = HTTP::ProxySelector::Persistent->new( testsite => 'http://yahoo.com' );

Default

  http://www.google.com 

set_proxy()

Chooses a proxy at random from the cache database and sets it as the proxy for a LWP::UserAgent. Automatically tests the proxy. If the proxy fails the test, it removes the proxy from the cache and chooses another one until it finds a working proxy. If necessary, this sub will try every proxy in the cache database.

Arguments: A LWP::Useragent object

Returns: 0 normally, or an error message if it fails

Example:

  $select->set_proxy( $ua );

get_proxy()

Arguments: none.

Returns: a scalar string containing the address of the selected proxy.

Example:

  $proxy_address = $select->get_proxy();

test_proxy()

Tests the proxy by trying to access a site (specified using the "testsite" option when constructing an HTTP::ProxySelector::Persistent object).

Arguments: an LWP::UserAgent object.

Returns: 0 for success, 1 for failure.

PRIVATE FUNCTIONS

_fetch_proxies()

Retrieves the proxy lists, extracts the proxy servers and caches them locally in a BerkeleyDB. This sub assumes that the cache database file does not exist. If this function were called before the cache database file were deleted, you would have duplicate entries in the cache and a bunch of old, potentially inoperative proxies stored in the cache. You should never have to call this method yourself. The new() method of the HTTP::ProxySelector::Persistent object uses this sub when the proxy cache database is either expired, malformed, empty or missing.

Arguments: none.

Returns:

0 upon success, or an error message string upon failure.

TODO

  • Code beautification. 1st draft is always rough.

AUTHOR

Michael Trowbridge, <michael.a.trowbridge at gmail.com>

BUGS

Please report any bugs or feature requests to bug-http-proxyselector-persistent at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTTP-ProxySelector-Persistent. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc HTTP::ProxySelector::Persistent

You can also look for information at:

ACKNOWLEDGEMENTS

This module is a fork from HTTP::ProxySelector v0.02. HTTP::ProxySelector v0.02 is copyright 2003 Eyal Udassin.

COPYRIGHT & LICENSE

Copyright 2007 Michael Trowbridge, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.