NAME
HTTP::Proxy - A pure Perl HTTP proxy
SYNOPSIS
use HTTP::Proxy;
# initialisation
my $proxy = HTTP::Proxy->new( port => 3128 );
# alternate initialisation
my $proxy = HTTP::Proxy->new;
$proxy->port( 3128 ); # the classical accessors are here!
# you can also use your own UserAgent
my $agent = LWP::RobotUA->new;
$proxy->agent( $agent );
# this is a MainLoop-like method
$proxy->start;
DESCRIPTION
This module implements a HTTP proxy, using a HTTP::Daemon to accept client connections, and a LWP::UserAgent to ask for the requested pages.
The most interesting feature of this proxy object is its hability to filter the HTTP requests and responses through user-defined filters.
METHODS
Constructor
Accessors
The HTTP::Proxy has several accessors.
Called with arguments, the accessor returns the current value. Called with a single argument, it set the current value and returns the previous one, in case you want to keep it.
If you call a read-only accessor with a parameter, this parameter will be ignored.
The defined accessors are (in alphabetical order):
- agent
-
The LWP::UserAgent object used internally to connect to remote sites.
- conn (read-only)
-
The number of connections processed by this HTTP::Proxy instance.
- control
-
The default hostname for controlling the proxy (see CONTROL). The default is "
proxy
", which corresponds to the URL http://proxy/, where port is the listening port of the proxy). - daemon
-
The HTTP::Daemon object used to accept incoming connections. (You usually never need this.)
- host
-
The proxy HTTP::Daemon host (default: 'localhost').
- logfh
-
A filehandle to a logfile (default: *STDERR).
- logmask( [$mask] )
-
Be verbose in the logs (default: NONE).
Here are the various elements that can be added to the mask: NONE - Log only errors STATUS - Requested URL, reponse status and total number of connections processed PROCESS - Subprocesses information (fork, wait, etc.) HEADERS - Full request and response headers are sent along FILTER - Filter information ALL - Log all of the above
If you only want status and process information, you can use:
$proxy->logmask( STATUS | PROCESS );
Note that all the logging constants are not exported by default, but by the
:log
tag. They can also be exported one by one. - maxchild
-
The maximum number of child process the HTTP::Proxy object will spawn to handle client requests (default: 16).
- maxconn
-
The maximum number of TCP connections the proxy will accept before returning from start(). 0 (the default) means never stop accepting connections.
- maxserve
-
The maximum number of requests the proxy will serve in a single connection. (same as MaxRequestsPerChild in Apache)
- port
-
The proxy HTTP::Daemon port (default: 8080).
- timeout
-
The timeout used by the internal LWP::UserAgent (default: 60).
- url (read-only)
-
The url where the proxy can be reached.
The start() method
This method works like Tk's MainLoop
: you hand over control to the HTTP::Proxy object you created and configured.
If maxconn
is not zero, start() will return after accepting at most that many connections.
Filters
You can alter the way the default HTTP::Proxy works by pluging callbacks at different stages of the request/response handling.
When a request is received by the HTTP::Proxy object, it is filtered through a standard filter that transform this request accordingly to RFC 2616 (by adding the Via: header, and a few other transformations).
The response is also filtered in the same manner. There is a total of four filter chains: request-headers
, request-body
, reponse-headers
and response-body
.
You can add your own filters to the default ones with the push_header_filter() and the push_body_filter() methods. Both methods work more or less the same way: they push a header filter on the corresponding filter stack.
$proxy->push_body_filter( response => $coderef );
The name of the method called gives the headers/body part while the named parameter give the request/response part.
It is possible to push the same coderef on the request and response stacks, as in the following example:
$proxy->push_header_filter( request => $coderef, response => $coderef );
Named parameters can be added. They are:
mime - the MIME type (for a response-body filter)
method - the request method
scheme - the URI scheme
host - the URI authority (host:port)
path - the URI path
The filters are applied only when all the the parameters match the request or the response. All these named parameters have default values, which are:
mime => 'text/*'
method => 'GET, POST, HEAD'
scheme => 'http'
host => ''
path => ''
The mime
parameter is a glob-like string, with a required /
character and a *
as a joker. Thus, */*
matches all responses, and ""
those with no Content-Type:
header. To match any reponse (with or without a Content-Type:
header), use undef
.
The mime
parameter is only meaningful with the response-body
filter stack. It is ignored if passed to any other filter stack.
The method
and scheme
parameters are strings consisting of comma-separated values. The host
and path
parameters are regular expressions.
A match routine is compiled by the proxy and used to check if a particular request or response must be filtered through a particular filter.
The signature for the "headers" filters is:
sub header_filter { my ( $headers, $message) = @_; ... }
where $header is a HTTP::Headers object, and $message is either a HTTP::Request or a HTTP::Response object.
The signature for the "body" filters is:
sub body_filter { my ( $dataref, $message, $protocol ) = @_; ... }
$dataref is a reference to the chunk of data received.
Note that this subroutine signature looks a lot like that of the callbacks of LWP::UserAgent (except that $message is either a HTTP::Request or a HTTP::Response object).
Here are a few example filters:
# fixes a common typo ;-)
# but chances are that this will modify a correct URL
$proxy->push_body_filter( response => sub { ${$_[0]} =~ s/PERL/Perl/g } );
# mess up trace requests
$proxy->push_headers_filter(
method => 'TRACE',
response => sub {
my $headers = shift;
$headers->header( X_Trace => "Something's wrong!" );
},
);
# a simple anonymiser
$proxy->push_headers_filter(
mime => undef,
request => sub {
$_[0]->remove_header(qw( User-Agent From Referer Cookie ));
},
response => sub {
$_[0]->revome_header(qw( Set-Cookie )),;
},
);
IMPORTANT: If you use your own LWP::UserAgent, you must install it before your calls to push_headers_filter() or push_body_filter(), or the match method will make wrong assumptions about the schemes your agent supports.
- push_headers_filter( type => coderef, %args )
- push_body_filter( type => coderef, %args )
- log( $level, $prefix, $message )
-
Adds $message at the end of
logfh
, if $level matcheslogmask
. The log() method also prints a timestamp.The output looks like:
[Thu Dec 5 12:30:12 2002] $prefix $message
If $message is a multiline string, several log lines will be output, each starting with $prefix.
EXPORTED SYMBOLS
No symbols are exported by default. The :log
tag exports all the logging constants.
BUGS
This does not work under Windows, but I can't see why, and do not have a development platform under that system. Patches and explanations very welcome.
The Date: header is duplicated.
This is still beta software, expect some interfaces to change as I receive feedback from users.
AUTHOR
Philippe "BooK" Bruhat, <book@cpan.org>.
The module has its own web page at http://http-proxy.mongueurs.net/ complete with older versions and repository snapshot.
THANKS
Many people helped me during the development of this module, either on mailing-lists, irc or over a beer in a pub...
So, in no particular order, thanks to the libwww-perl team for such a terrific suite of modules, Michael Schwern (tips for testing while forking), the Paris.pm folks (forking processes, chunked encoding) and my growing user base... ;-)
COPYRIGHT
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.