NAME

HTTP::Proxy - A pure Perl HTTP proxy

SYNOPSIS

use HTTP::Proxy;

# initialisation
my $proxy = HTTP::Proxy->new( port => 3128 );

# alternate initialisation
my $proxy = HTTP::Proxy->new;
$proxy->port( 3128 ); # the classical accessors are here!

# you can also use your own UserAgent
my $agent = LWP::RobotUA->new;
$proxy->agent( $agent );

# this is a MainLoop-like method
$proxy->start;

DESCRIPTION

This module implements a HTTP proxy, using a HTTP::Daemon to accept client connections, and a LWP::UserAgent to ask for the requested pages.

The most interesting feature of this proxy object is its hability to filter the HTTP requests and responses through user-defined filters.

METHODS

Constructor

Accessors

The HTTP::Proxy has several accessors.

Called with arguments, the accessor returns the current value. Called with a single argument, it set the current value and returns the previous one, in case you want to keep it.

If you call a read-only accessor with a parameter, this parameter will be ignored.

The defined accessors are (in alphabetical order):

agent

The LWP::UserAgent object used internally to connect to remote sites.

conn (read-only)

The number of connections processed by this HTTP::Proxy instance.

control

The default hostname for controlling the proxy (see CONTROL). The default is "proxy", which corresponds to the URL http://proxy/, where port is the listening port of the proxy).

daemon

The HTTP::Daemon object used to accept incoming connections. (You usually never need this.)

host

The proxy HTTP::Daemon host (default: 'localhost').

logfh

A filehandle to a logfile (default: *STDERR).

logmask( [$mask] )

Be verbose in the logs (default: NONE).

Here are the various elements that can be added to the mask: NONE - Log only errors STATUS - Requested URL, reponse status and total number of connections processed PROCESS - Subprocesses information (fork, wait, etc.) HEADERS - Full request and response headers are sent along FILTER - Filter information ALL - Log all of the above

If you only want status and process information, you can use:

$proxy->logmask( STATUS | PROCESS );

Note that all the logging constants are not exported by default, but by the :log tag. They can also be exported one by one.

maxchild

The maximum number of child process the HTTP::Proxy object will spawn to handle client requests (default: 16).

maxconn

The maximum number of TCP connections the proxy will accept before returning from start(). 0 (the default) means never stop accepting connections.

maxserve

The maximum number of requests the proxy will serve in a single connection. (same as MaxRequestsPerChild in Apache)

port

The proxy HTTP::Daemon port (default: 8080).

timeout

The timeout used by the internal LWP::UserAgent (default: 60).

url (read-only)

The url where the proxy can be reached.

The start() method

This method works like Tk's MainLoop: you hand over control to the HTTP::Proxy object you created and configured.

If maxconn is not zero, start() will return after accepting at most that many connections.

Filters

You can alter the way the default HTTP::Proxy works by pluging callbacks at different stages of the request/response handling.

When a request is received by the HTTP::Proxy object, it is filtered through a standard filter that transform this request accordingly to RFC 2616 (by adding the Via: header, and a few other transformations).

The response is also filtered in the same manner. There is a total of four filter chains: request-headers, request-body, reponse-headers and response-body.

You can add your own filters to the default ones with the push_header_filter() and the push_body_filter() methods. Both methods work more or less the same way: they push a header filter on the corresponding filter stack.

$proxy->push_body_filter( response => $coderef );

The name of the method called gives the headers/body part while the named parameter give the request/response part.

It is possible to push the same coderef on the request and response stacks, as in the following example:

$proxy->push_header_filter( request => $coderef, response => $coderef );

Named parameters can be added. They are:

mime   - the MIME type (for a response-body filter)
method - the request method
scheme - the URI scheme         
host   - the URI authority (host:port)
path   - the URI path

The filters are applied only when all the the parameters match the request or the response. All these named parameters have default values, which are:

mime   => 'text/*'
method => 'GET, POST, HEAD'
scheme => 'http'
host   => ''
path   => ''

The mime parameter is a glob-like string, with a required / character and a * as a joker. Thus, */* matches all responses, and "" those with no Content-Type: header. To match any reponse (with or without a Content-Type: header), use undef.

The mime parameter is only meaningful with the response-body filter stack. It is ignored if passed to any other filter stack.

The method and scheme parameters are strings consisting of comma-separated values. The host and path parameters are regular expressions.

A match routine is compiled by the proxy and used to check if a particular request or response must be filtered through a particular filter.

The signature for the "headers" filters is:

sub header_filter { my ( $headers, $message) = @_; ... }

where $header is a HTTP::Headers object, and $message is either a HTTP::Request or a HTTP::Response object.

The signature for the "body" filters is:

sub body_filter { my ( $dataref, $message, $protocol ) = @_; ... }

$dataref is a reference to the chunk of data received.

Note that this subroutine signature looks a lot like that of the callbacks of LWP::UserAgent (except that $message is either a HTTP::Request or a HTTP::Response object).

Here are a few example filters:

# fixes a common typo ;-)
# but chances are that this will modify a correct URL
$proxy->push_body_filter( response => sub { ${$_[0]} =~ s/PERL/Perl/g } );

# mess up trace requests
$proxy->push_headers_filter(
    method   => 'TRACE',
    response => sub {
        my $headers = shift;
        $headers->header( X_Trace => "Something's wrong!" );
    },
);

# a simple anonymiser
$proxy->push_headers_filter(
    mime    => undef,
    request => sub {
        $_[0]->remove_header(qw( User-Agent From Referer Cookie ));
    },
    response => sub {
        $_[0]->revome_header(qw( Set-Cookie )),;
    },
);

IMPORTANT: If you use your own LWP::UserAgent, you must install it before your calls to push_headers_filter() or push_body_filter(), or the match method will make wrong assumptions about the schemes your agent supports.

push_headers_filter( type => coderef, %args )
push_body_filter( type => coderef, %args )
log( $level, $prefix, $message )

Adds $message at the end of logfh, if $level matches logmask. The log() method also prints a timestamp.

The output looks like:

[Thu Dec  5 12:30:12 2002] $prefix $message

If $message is a multiline string, several log lines will be output, each starting with $prefix.

EXPORTED SYMBOLS

No symbols are exported by default. The :log tag exports all the logging constants.

BUGS

This does not work under Windows, but I can't see why, and do not have a development platform under that system. Patches and explanations very welcome.

The Date: header is duplicated.

This is still beta software, expect some interfaces to change as I receive feedback from users.

AUTHOR

Philippe "BooK" Bruhat, <book@cpan.org>.

The module has its own web page at http://http-proxy.mongueurs.net/ complete with older versions and repository snapshot.

THANKS

Many people helped me during the development of this module, either on mailing-lists, irc or over a beer in a pub...

So, in no particular order, thanks to the libwww-perl team for such a terrific suite of modules, Michael Schwern (tips for testing while forking), the Paris.pm folks (forking processes, chunked encoding) and my growing user base... ;-)

COPYRIGHT

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.