NAME
WWW::PkgFind - Spiders given URL(s) downloading wanted files
SYNOPSIS
my $Pkg = new WWW::PkgFind("foobar"); $Pkg->depth(3); $Pkg->active_urls("ftp://ftp.somesite.com/pub/joe/foobar/"); $Pkg->wanted_regex("patch-2\.6\..*gz", "linux-2\.6.\d+\.tar\.bz2"); $Pkg->set_create_queue("/testing/packages/QUEUE"); $Pkg->retrieve();
DESCRIPTION
TODO
FUNCTIONS
new([$pkg_name], [$agent_desc])
Creates a new WWW::PkgFind object
package_name()
Gets/sets the package name
depth()
wanted_regex()
not_wanted_regex()
rename_regex()
active_urls()
robot_urls()
files()
processed()
set_create_queue($dir)
Specifies that the retrieve() routine should also create a symlink queue in the specified directory.
set_debug($debug)
Turns on debug level. Set to 0 or undef to turn off.
want_file($file)
Checks the regular expressions in the Pkg hash. Returns 1 (true) if file matches at least one wanted regexp and none of the not_wanted regexp's. If the file matches a not-wanted regexp, it returns 0 (false). If it has no clue what the file is, it returns undef (false).
get_file($url, $dest)
Retrieves the given URL, returning true if the file was successfully obtained and placed at $dest, false if something prevented this from happening.
get_file also checks for and respects robot rules, updating the $rules object as needed, and caching url's it's checked in %robot_urls. $robot_urls{$url} will be >0 if a robots.txt was found and parsed, <0 if no robots.txt was found, and undef if the url has not yet been checked.
retrieve()
AUTHOR
Bryce Harrington <bryce@osdl.org>
COPYRIGHT
Copyright (C) 2006 Bryce Harrington. All Rights Reserved.
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.