NAME

HTTP::GetImages

DESCRIPTION

Recover and locally store images from the web, including those linked by anchor and image map.

Version 0.2+ also gets images from Anchor elements' and image-map AREA elements' HREF/SRC. attributes.

Version 0.23+ allows a limitation on the image size retreived.

SYNOPSIS

use HTTP::GetImages;
new HTTP::GetImages (
	'/image/save/dir',
	['http://www.getthese.com/all','http://get.this/'],
	['http://www.somewhere/ignorethis.html','http://and.this.html']
);

new HTTP::GetImages (
	'/image/save/dir',
	['http://www.getthese.com/all','http://get.this/'],
	['ALL'],
	'http://www.getthses.com/all/useTHISasROOT/',
	$minsize
);

print "\nFailed these URLs:-\n";
foreach (keys %{$self->{FAILED}})	{ print "\t$_\n" }

print "\nIgnored these URLs:-\n";
foreach (keys %{$self->{IGNORED}})	{ print "\t$_\n" }

DEPENDENCIES

strict;
warnings;
Carp;
LWP::UserAgent;
HTTP::Request;
HTML::TokeParser;

PACKAGE GLOBAL VARIABLE

$CHAT

Set to above zero if you'd like a real-time report to STDERR. Defaults to off.

$EXTENSIONS_RE

A regular expression 'or' list of image extensions to match.

Will be applied at the end of a filename, after a point, and is insensitive to case.

Defaults to (jpg|jpeg|bmp|gif|png|xbm|xmp).

$NEWNAMES

Set to above zero to save files with new names; defaults to using original names.

CONSTRUCTOR METHOD new

Besides the class reference, accepts:

  1. the path to the directory in which to store images (no trailing oblique necessary);

  2. reference to array of URLs to process;

  3. reference to array of URLs to ignore.

    If one of these is ALL, then will ignore all HTML documents not in the referenced array of URLs to process. If one of these is NONE, will ignore no documents.

    Returns a blessed hash, keys of which are:

  4. The minimum path the URL must contain.

  5. The minimum size an image must be to be saved.

  6. DONE

    a hash keys of which are the original URLs of the images, value being are the local filenames.

  7. FAILED

    a hash, keys of which are the failed URLs, values being short reasons.

PRIVATE METHOD _save_img

Accepts the dir in which to store the image, the image's URL (won't store same image twice) and the actual image source.

Returns the path the image was saved at.

SEE ALSO

Every thing and every one listed above under DEPENDENCIES.

AUTHOR

Lee Goddard (LGoddard@CPAN.org) 05/05/2001 16:08

COPYRIGHT

Copyright 2000-2001 Lee Goddard.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 94:

'=item' outside of any '=over'

Around line 119:

Expected '=item 6'

Around line 123:

Expected '=item 7'

Around line 261:

You forgot a '=back' before '=head1'