Why not adopt me?
NAME
WWW::Pastebin::Base::Retrieve - base class for modules which implement retrieving of pastes from pastebins
SYNOPSIS
package WWW::Pastebin::PhpfiCom::Retrieve;
use base 'WWW::Pastebin::Base::Retrieve';
use HTML::TokeParser::Simple;
use HTML::Entities;
sub _make_uri_and_id {
# here we get whatever user passed to retrieve()
# and we need to return the ID of the paste and URI pointing to it
my ( $self, $id ) = @_;
$id =~ s{ ^\s+ | (?:http://)? (?:www\.)? phpfi\.com/(?=\d+) | \s+$ }{}xi;
return $self->_set_error(
q|Doesn't look like a correct ID or URI to the paste|
) if $id =~ /\D/;
return ( URI->new("http://www.phpfi.com/$id"), $id );
}
sub _get_was_successful {
# this sub actually defaults to $self->_parse( $content );
# which is fine for most pastebins...
my ( $self, $content ) = @_;
my $results_ref = $self->_parse( $content );
return
unless defined $results_ref;
my $content_uri = $self->uri->clone;
$content_uri->query_form( download => 1 );
my $content_response = $self->ua->get( $content_uri );
if ( $content_response->is_success ) {
$results_ref->{content} = $self->content($content_response->content);
return $self->results( $results_ref );
}
else {
return $self->_set_error(
'Network error: ' . $content_response->status_line
);
}
}
sub _parse {
# this is the "core", this sub would parse out the content of
# the paste and return data
my ( $self, $content ) = @_;
my $parser = HTML::TokeParser::Simple->new( \$content );
my %data;
my %nav = (
content => '',
map { $_ => 0 }
qw(get_info level get_lang is_success get_content check_404)
);
while ( my $t = $parser->get_token ) {
if ( $t->is_start_tag('td') ) {
$nav{get_info}++;
$nav{check_404}++;
$nav{level} = 1;
}
# blah blah, blah do some parsin'
# if you want to see full example see 'examples' directory
# of this distribution
elsif ( $nav{get_lang} == 1
and $t->is_start_tag('option')
and defined $t->get_attr('selected')
and defined $t->get_attr('value')
) {
$data{lang} = $t->get_attr('value');
$nav{is_success} = 1;
last;
}
}
return $self->_set_error('This paste does not seem to exist')
if $nav{content} =~ /entry \d+ not found/i;
return $self->_set_error("Parser error! Level == $nav{level}")
unless $nav{is_success};
$data{ $_ } = decode_entities( delete $data{ $_ } )
for grep { $_ ne 'content' } keys %data;
return \%data;
}
package main;
my $paster = WWW::Pastebin::PhpfiCom::Retrieve->new;
$paster->retrieve('http://phpfi.com/302683')
or die $paster->error;
print "Paste content is:\n$paster\n";
DESCRIPTION
This module is a base class for modules which provide interface to fetch pastes on various pastebin sites. How useful this module may be to you depends entirely on the pastebin site you want to interface is. The synopsis shows a version of WWW::Pastebin::PhpfiCom::Retrieve module (with parser trimmed down) which requires a bit more than usual pastebin sites.
PROVIDED METHODS
new
retrieve
error
content
results
ua
uri
id
Private methods:
_make_uri_and_id
_parse
_get_was_successful
_set_error
Also the content()
method is overloaded for interpolation. Thus users of your module can interpolate the object in string to obtain contents of the retrieved paste.
METHODS YOU NEED TO OVERRIDE
In general, the smallest module would provide the _make_uri_and_id()
and _parse()
methods. The _parse
method would set the content()
data accessor or set the error()
by using return $self->_set_error('Some error')
Functionality of private methods is described below. Functionality of public methods is described in the "DOCUMENTATION FOR YOUR MODULE" section.
PRIVATE METHODS
_make_uri_and_id
sub _make_uri_and_id {
# here we get whatever user passed to retrieve()
# and we need to return the ID of the paste and URI pointing to it
my ( $self, $id ) = @_;
$id =~ s{ ^\s+ | (?:http://)? (?:www\.)? phpfi\.com/(?=\d+) | \s+$ }{}xi;
return $self->_set_error(
q|Doesn't look like a correct ID or URI to the paste|
) if $id =~ /\D/;
return ( URI->new("http://www.phpfi.com/$id"), $id );
}
The _make_uri_and_id()
method will be called internally by the object when the user calls the parse()
method. The @_
will contain the same elements which user provided with his/her call to retrieve()
method. Note: the base class will check the first argument to defined()
ness and length()
before calling _make_uri_and_id()
method.
This method must return a list of two elements, first element must be a URI object pointing to the page containing the paste and the second element must be the ID of the paste. These will be assigned to uri()
and id()
public methods.
_get_was_successful
sub _get_was_successful {
# this sub actually defaults to $self->_parse( $content );
# which is fine for most pastebins...
my ( $self, $content ) = @_;
my $results_ref = $self->_parse( $content );
return
unless defined $results_ref;
my $content_uri = $self->uri->clone;
$content_uri->query_form( download => 1 );
my $content_response = $self->ua->get( $content_uri );
if ( $content_response->is_success ) {
$results_ref->{content} = $self->content($content_response->content);
return $self->results( $results_ref );
}
else {
return $self->_set_error(
'Network error: ' . $content_response->status_line
);
}
}
With many pastebins you won't even have to touch the _get_was_successful()
method. It defaults to:
sub _get_was_successful {
my ( $self, $content ) = @_;
return $self->results( $self->_parse( $content ) );
}
And is called inside retrieve()
method when the LWP::UserAgent object successfuly retrieved the page of the pastebin. This method is provided in case you'll need to make more requests as was the case with http://phpfi.com/ pastebin shown in the "SYNOPSIS".
_parse
# See "SYNOPSYS" or script in 'examples' directory for an example
The _parse
method is what will be called upon successful retrieval of the page with the paste. Here you would normally parse out anything you need, set the content()
accessor/mutator (see "DOCUMENTATION FOR YOUR MODULE" section) and return a reference to the data you've parsed out, the return value will be available to the user via results()
method.
_set_error
do_stuff()
or return $self->_set_error('blah');
The _set_error()
method is not something you'd normally would override as it is just a handy method to set the error to whatever is passed in the argument and do a return;
. When second argument is passed the first argument will be treated as a HTTP::Response object and the error will be constructed as 'Network error: ' . $first_arg->status_line
The default _set_error
method looks like this:
sub _set_error {
my ( $self, $error_or_response_obj, $is_net_error ) = @_;
if ( defined $is_net_error ) {
$self->error( 'Network error: ' . $error_or_response_obj->status_line
);
}
else {
$self->error( $error_or_response_obj );
}
return;
}
DOCUMENTATION FOR YOUR MODULE
This section describes the functionality of public methods and is presented in a copy/paste friendly format so you could save yourself some time writing up docs for your module. The word "EXAMPLE" is used in places you need to edit, but make sure to proof-read the whole thing anyway.
=head1 NAME
WWW::Pastebin::EXAMPLE::Retrieve - a module to retrieve pastes from EXAMPLE website
=head1 SYNOPSIS
my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new;
$paster->retrieve('http://EXAMPLE')
or die $paster->error;
print "Paste content is:\n$paster\n";
=head1 DESCRIPTION
The module provides interface to retrieve pastes from EXAMPLE website via
Perl.
=head1 CONSTRUCTOR
=head2 C<new>
my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new;
my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new(
timeout => 10,
);
my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new(
ua => LWP::UserAgent->new(
timeout => 10,
agent => 'PasterUA',
),
);
Constructs and returns a brand new juicy WWW::Pastebin::EXAMPLE::Retrieve
object. Takes two arguments, both are I<optional>. Possible arguments are
as follows:
=head3 C<timeout>
->new( timeout => 10 );
B<Optional>. Specifies the C<timeout> argument of L<LWP::UserAgent>'s
constructor, which is used for retrieving. B<Defaults to:> C<30> seconds.
=head3 C<ua>
->new( ua => LWP::UserAgent->new( agent => 'Foos!' ) );
B<Optional>. If the C<timeout> argument is not enough for your needs
of mutilating the L<LWP::UserAgent> object used for retrieving, feel free
to specify the C<ua> argument which takes an L<LWP::UserAgent> object
as a value. B<Note:> the C<timeout> argument to the constructor will
not do anything if you specify the C<ua> argument as well. B<Defaults to:>
plain boring default L<LWP::UserAgent> object with C<timeout> argument
set to whatever C<WWW::Pastebin::EXAMPLE::Retrieve>'s C<timeout> argument is
set to as well as C<agent> argument is set to mimic Firefox.
=head1 METHODS
=head2 C<retrieve>
my $results_ref = $paster->retrieve('http://EXAMPLE/301425')
or die $paster->error;
my $results_ref = $paster->retrieve('EXAMPLE301425')
or die $paster->error;
Instructs the object to retrieve a paste specified in the argument. Takes
one mandatory argument which can be either a full URI to the paste you
want to retrieve or just its ID.
On failure returns either C<undef> or an empty list depending on the context
and the reason for the error will be available via C<error()> method.
On success returns a hashref with the following keys/values:
EXAMPLE
EXAMPLE
EXAMPLE
=head2 C<error>
$paster->retrieve('EXAMPLE')
or die $paster->error;
On failure C<retrieve()> returns either C<undef> or an empty list depending
on the context and the reason for the error will be available via C<error()>
method. Takes no arguments, returns an error message explaining the failure.
=head2 C<id>
my $paste_id = $paster->id;
Must be called after a successful call to C<retrieve()>. Takes no arguments,
returns a paste ID number of the last retrieved paste irrelevant of whether
an ID or a URI was given to C<retrieve()>
=head2 C<uri>
my $paste_uri = $paster->uri;
Must be called after a successful call to C<retrieve()>. Takes no arguments,
returns a L<URI> object with the URI pointing to the last retrieved paste
irrelevant of whether an ID or a URI was given to C<retrieve()>
=head2 C<results>
my $last_results_ref = $paster->results;
Must be called after a successful call to C<retrieve()>. Takes no arguments,
returns the exact same hashref the last call to C<retrieve()> returned.
See C<retrieve()> method for more information.
=head2 C<content>
my $paste_content = $paster->content;
print "Paste content is:\n$paster\n";
Must be called after a successful call to C<retrieve()>. Takes no arguments,
returns the actual content of the paste. B<Note:> this method is overloaded
for this module for interpolation. Thus you can simply interpolate the
object in a string to get the contents of the paste.
=head2 C<ua>
my $old_LWP_UA_obj = $paster->ua;
$paster->ua( LWP::UserAgent->new( timeout => 10, agent => 'foos' );
Returns a currently used L<LWP::UserAgent> object used for retrieving
pastes. Takes one optional argument which must be an L<LWP::UserAgent>
object, and the object you specify will be used in any subsequent calls
to C<retrieve()>.
=head1 SEE ALSO
L<LWP::UserAgent>, L<URI>
SEE ALSO
WWW::Pastebin::Base::Create, LWP::UserAgent, URI
AUTHOR
Zoffix Znet, <zoffix at cpan.org>
(http://zoffix.com, http://haslayout.net)
BUGS
Please report any bugs or feature requests to bug-www-pastebin-base-retrieve at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Pastebin-Base-Retrieve. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc WWW::Pastebin::Base::Retrieve
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Pastebin-Base-Retrieve
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2008 Zoffix Znet, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.