NAME
HTML::Any - a common interface for HTTP clients (LWP, AnyEvent::HTTP, Curl)
SYNOPSIS
use HTTP::Any::...
use ...
sub do_http {
...
HTTP::Any::...
}
my $opt = { ... };
my $cb = sub {
my ($is_success, $body, $headers, $redirects) = @_;
...
}
do_http($url, $opt, $cb);
MOTIVATION
LWP, AnyEvent::HTTP, Curl - each of them has its advantages, disadvantages and peculiarities. The HTML::Any modules were created during the process of investigation of the strong and weak sides of those above-mentioned HTML clients. They allow quick switching between them to use the best one for each definite case.
DESCRIPTION
IMPORT
I recommend placing using HTTP::Any in a separate module which should be used from any point of your project.
Why would not make a simple one-line connection? Because of better flexibility and an option to replace the modules used. For example, using LWP::RobotUA instead for LWP::UserAgent.
LWP
use LWP;
use HTTP::Any::LWP;
sub do_http {
my $ua = LWP::UserAgent->new;
HTTP::Any::LWP::do_http($ua, @_);
}
AnyEvent
use EV;
use AnyEvent::HTTP;
use HTTP::Any::AnyEvent;
sub do_http {
HTTP::Any::AnyEvent::do_http(\&http_request, @_);
}
Curl
use Net::Curl::Easy;
use HTTP::Any::Curl;
sub do_http {
my ($url, $opt, $cb) = @_;
my $easy = Net::Curl::Easy->new();
HTTP::Any::Curl::do_http(undef, $easy, $url, $opt, $cb);
}
Curl with Multi
use Net::Curl::Easy;
use Net::Curl::Multi;
use Net::Curl::Multi::EV;
use HTTP::Any::Curl;
my $multi = Net::Curl::Multi->new();
my $curl_ev = Net::Curl::Multi::EV::curl_ev($multi);
sub do_http {
my ($url, $opt, $cb) = @_;
my $easy = Net::Curl::Easy->new();
HTTP::Any::Curl::do_http($curl_ev, $easy, $url, $opt, $cb);
}
CALL
my $opt = { ... };
my $cb = sub {
my ($is_success, $body, $headers, $redirects) = @_;
...
}
do_http($url, $opt, $cb);
where:
- url
-
URL as string
- opt
-
options and headers
- cb
-
callback function to get result
options
- referer
-
Referer url
- agent
-
User agent name
- timeout
-
Timeout, seconds
- gzip
-
This option adds 'Accept-Encoding' header with gzip value to the HTTP query and tells that the response must be decoded. If you don't want to decode the response, please add 'Accept-Encoding' header into the 'headers' parameter.
- headers
-
Ref on HASH of HTTP headers:
{ 'Accept' => '*/*', ... }
-
It enables cookies support. The "" values enables the session cookies support without saving them. Any other value is transferred as is: ref to a hash (LWP, AnyEvent::HTTP), the file's name (Curl).
- persistent
-
1 or 0. Try to create/reuse a persistent connection. When not specified, see the default behavior of Curl (reverse of CURLOPT_FORBID_REUSE) and AnyEvent::HTTP (persistent)
- proxy
-
http and socks proxy
proxy => "$host:$port" or proxy => "$scheme://$host:$port" where scheme can be one of the: http, socks (socks5), socks5, socks4.
Install LWP::Protocol::socks to use socks proxy with LWP.
Use AnyEvent::HTTP::Socks instead AnyEvent::HTTP for socks proxy.
- max_size
-
The size limit for response content, bytes.
Note: when you use the accept_encoding and max_size options will be triggered, the current mode is the following: HTTP::Any::Curl - will return the result partially, HTTP::Any::LWP - will return "", HTTP::Any::AnyEvent - will return "".
However, this state can be changed in future.
When max_size options will be triggered, 'client-aborted' header will added with 'max_size' value.
- body
-
Data for POST method.
String or CODE ref to return strings (return undef is end of body data).
N.B. CODE ref is not supported for AnyEvent::HTTP (v2.21).
- method
-
When method parameter is "POST", the POST request is used with body parameter on data and 'Content-Type' header is added with 'application/x-www-form-urlencoded' value.
finish callback function
my $cb = sub {
my ($is_success, $body, $headers, $redirects) = @_;
...
};
where:
- is_success
-
It is true, when HTTP code is 2XX.
- body
-
HTML body. When on_header callback function is defined, then body is undef.
- headers
-
Ref on HASH of HTTP headers (lowercase) and others info: Status, Reason, URL
- redirects
-
Previous headers from last to first
on_header callback function
When specified, this callback will be called after getting all headers.
$opt{on_header} = sub {
my ($is_success, $headers, $redirects) = @_;
...
};
on_body callback function
When specified, this callback will be called on each chunk.
$opt{on_body} = sub {
my ($body) = @_; # body chunk
...
};
NOTES
Turn off the persistent options to download pages of many sites.
Use libcurl with "Asynchronous DNS resolution via c-ares".
AUTHOR
Nick Kostyria <kni@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Nick Kostyria
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.