NAME
HTTP::GetImages
DESCRIPTION
Recover and locally store images from the web, including those linked by anchor and image map.
Version 0.2+ also gets images from A
nchor elements' and image-map AREA
elements' HREF
/SRC
. attributes.
Version 0.23+ allows a limitation on the image size retreived.
SYNOPSIS
use HTTP::GetImages;
new HTTP::GetImages (
'/image/save/dir',
['http://www.getthese.com/all','http://get.this/'],
['http://www.somewhere/ignorethis.html','http://and.this.html']
);
new HTTP::GetImages (
'/image/save/dir',
['http://www.getthese.com/all','http://get.this/'],
['ALL'],
'http://www.getthses.com/all/useTHISasROOT/',
$minsize
);
print "\nFailed these URLs:-\n";
foreach (keys %{$self->{FAILED}}) { print "\t$_\n" }
print "\nIgnored these URLs:-\n";
foreach (keys %{$self->{IGNORED}}) { print "\t$_\n" }
DEPENDENCIES
strict;
warnings;
Carp;
LWP::UserAgent;
HTTP::Request;
HTML::TokeParser;
PACKAGE GLOBAL VARIABLE
$CHAT
Set to above zero if you'd like a real-time report to STDERR
. Defaults to off.
$EXTENSIONS_RE
A regular expression 'or' list of image extensions to match.
Will be applied at the end of a filename, after a point, and is insensitive to case.
Defaults to (jpg|jpeg|bmp|gif|png|xbm|xmp)
.
$NEWNAMES
Set to above zero to save files with new names; defaults to using original names.
CONSTRUCTOR METHOD new
Besides the class reference, accepts:
the path to the directory in which to store images (no trailing oblique necessary);
reference to array of URLs to process;
reference to array of URLs to ignore.
If one of these is
ALL
, then will ignore all HTML documents not in the referenced array of URLs to process. If one of these isNONE
, will ignore no documents.Returns a blessed hash, keys of which are:
The minimum path the URL must contain.
The minimum size an image must be to be saved.
DONE
a hash keys of which are the original URLs of the images, value being are the local filenames.
FAILED
a hash, keys of which are the failed URLs, values being short reasons.
PRIVATE METHOD _save_img
Accepts the dir in which to store the image, the image's URL (won't store same image twice) and the actual image source.
Returns the path the image was saved at.
SEE ALSO
Every thing and every one listed above under DEPENDENCIES.
AUTHOR
Lee Goddard (LGoddard@CPAN.org) 05/05/2001 16:08
COPYRIGHT
Copyright 2000-2001 Lee Goddard.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
4 POD Errors
The following errors were encountered while parsing the POD:
- Around line 94:
'=item' outside of any '=over'
- Around line 119:
Expected '=item 6'
- Around line 123:
Expected '=item 7'
- Around line 261:
You forgot a '=back' before '=head1'