NAME

HTTP::Proxy - A pure Perl HTTP proxy

SYNOPSIS

use HTTP::Proxy;

# initialisation
my $proxy = HTTP::Proxy->new( port => 3128 );

# alternate initialisation
my $proxy = HTTP::Proxy->new;
$proxy->port( 3128 ); # the classical accessors are here!

# you can also use your own UserAgent
my $agent = LWP::RobotUA->new;
$proxy->agent( $agent );

# this is a MainLoop-like method
$proxy->start;

DESCRIPTION

This module implements a HTTP Proxy, using a HTTP::Daemon to accept client connections, and a LWP::UserAgent to ask for the requested pages.

METHODS

Constructor

Accessors

The HTTP::Proxy has several accessors. They are all AUTOLOADed.

Called with arguments, the accessor returns the current value. Called with a single argument, it set the current value and returns the previous one, in case you want to keep it.

If you call a read-only accessor with a parameter, this parameter will be ignored.

The defined accessors are (in alphabetical order):

agent

The LWP::UserAgent object used internally to connect to remote sites.

conn (read-only)

The number of connections processed by this HTTP::Proxy instance.

control

The default hostname for controlling the proxy (see CONTROL). The default is "proxy", which corresponds to the URL http://proxy/, where port is the listening port of the proxy).

daemon

The HTTP::Daemon object used to accept incoming connections. (You usually never need this.)

host

The proxy HTTP::Daemon host (default: 'localhost').

logfh

A filehandle to a logfile (default: *STDERR).

maxchild

The maximum number of child process the HTTP::Proxy object will spawn to handle client requests (default: 16).

maxconn

The maximum number of connections the proxy will accept before returning from start(). 0 (the default) means never stop accepting connections.

port

The proxy HTTP::Daemon port (default: 8080).

url (read-only)

The url where the proxy can be reached.

verbose

Be verbose in the logs (default: 0).

Here are the various log levels: 0 - All errors 1 - Requested URL, reponse status and total number of connections processed 2 - 3 - Subprocesses information (fork, wait, etc.) 4 - 5 - Full request and response headers are sent along

The start() method

This method works like Tk's MainLoop: you hand over control to the HTTP::Proxy object you created and configured.

If maxconn is not zero, start() will return after accepting at most that many connections.

Callbacks

You can alter the way the default HTTP::Proxy works by pluging callbacks at different stages of the request/response handling.

When a request is received by the HTTP::Proxy object, it is filtered through a standard filter that transform this request accordingly to RFC 2616 (by adding the Via: header, and a few other transformations).

The response is also filtered in the same manner. There is a total of four filter chains: request-headers, request-body, reponse-headers and response-body.

You can add your own filters to the default ones with the push_header_filter() and the push_body_filter() methods. Both methods work more or less the same way: they push a header filter on the corresponding filter stack.

$proxy->push_body_filter( response => $coderef );

The name of the method called gives the headers/body part while the named parameter give the request/response part.

It is possible to push the same coderef on the request and response stacks, as in the following example:

   $proxy->push_header_filter( request => $coderef, response => $coderef );

Named parameters can be added. They are:

mime   - the MIME type (for a response-body filter)
method - the request method
scheme - the URI scheme         
host   - the URI authority (host:port)
path   - the URI path

The filters are applied only when all the the parameters match the request or the response. All these named parameters have default values, which are:

mime   => 'text/*'
method => 'GET, POST, HEAD'
scheme => 'http'
host   => ''
path   => ''

The mime parameter is a glob-like string, with a required / character and a * as a joker. Thus, */* matches all responses, and "" those with no Content-Type: header. To match any reponse (with or without a Content-Type: header), use undef.

The mime parameter is only meaningful with the response-body filter stack. It is ignored if passed to any other filter stack.

The method and scheme parameters are strings consisting of comma-separated values. The host and path parameters are regular expressions.

A match routine is compiled by the proxy and used to check if a particular request or response must be filtered through a particular filter.

The signature for the "headers" filters is:

sub header_filter { my ( $headers, $message) = @_; ... }

where $header is a HTTP::Headers object, and $message is either a HTTP::Request or a HTTP::Response object.

The signature for the "body" filters is:

sub body_filter { my ( $dataref, $message, $protocol ) = @_; ... }

$dataref is a reference to the chunk of data received.

Note that this subroutine signature looks a lot like that of the callbacks of LWP::UserAgent (except that $message is either a HTTP::Request or a HTTP::Response object).

Here are a few example filters:

# fixes a common typo ;-)
# but chances are that this will modify a correct URL
$proxy->push_body_filter( response => sub { ${$_[0]} =~ s/PERL/Perl/g } );

# mess up trace requests
$proxy->push_headers_filter(
    method   => 'TRACE',
    response => sub {
        my $headers = shift;
        $headers->header( X_Trace => "Something's wrong!" );
    },
);

# a simple anonymiser
$proxy->push_headers_filter(
    mime    => undef,
    request => sub {
        $_[0]->remove_header(qw( User-Agent From Referer Cookie ));
    },
    response => sub {
        $_[0]->revome_header(qw( Set-Cookie )),;
    },
);

IMPORTANT: If you use your own LWP::UserAgent, you must install it before your calls to push_headers_filter() or push_body_filter(), or the match method will make wrong assumptions about the schemes your agent supports.

push_headers_filter( type => coderef, %args )
push_body_filter( type => coderef, %args )
log( $level, $prefix, $message )

Adds $message at the end of logfh, if $level is greater than verbose, the log() method also prints a timestamp.

The output looks like:

[Thu Dec  5 12:30:12 2002] $prefix $message

If $message is a multiline string, several log lines will be output, each starting with $prefix.

BUGS

I've heard that some Unix systems do not support calling accept() in a child process when the socket was opened by the parent (especially when several child process accept() at the same time).

It looks like it's the case under Windows. Expect the prefork system to change soon.

TODO

* support Windows systems

* Provide an interface for logging.

* Provide control over the proxy through special URLs

AUTHOR

Philippe "BooK" Bruhat, <book@cpan.org>.

THANKS

Many people helped me during the development of this module, either on mailing-lists, irc, or over a beer in a pub...

So, in no particular order, thanks to Michael Schwern (testing while forking), Eric 'echo' Cholet (preforked processes).

COPYRIGHT

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.