NAME

WebFetch - Perl module to download and save information from the Web

SYNOPSIS

use WebFetch;

DESCRIPTION

The WebFetch module is a framework for downloading and saving information from the web, and for saving or re-displaying it. It provides a generalized interface for saving to a file while keeping the previous version as a backup. This is mainly intended for use in a cron-job to acquire periodically-updated information.

WebFetch allows the user to specify a source and destination, and the input and output formats. It is possible to write new Perl modules to the WebFetch API in order to add more input and output formats.

The currently-provided input formats are Atom, RSS, WebFetch "SiteNews" files and raw Perl data structures.

The currently-provided output formats are RSS, WebFetch "SiteNews" files, the Perl Template Toolkit, and export into a TWiki site.

Some modules which were specific to pre-RSS/Atom web syndication formats have been deprecated. Those modules can be found in the CPAN archive in WebFetch 0.10. Those modules are no longer compatible with changes in the current WebFetch API.

INSTALLATION

After unpacking and the module sources from the tar file, run

perl Makefile.PL

make

make install

Or from a CPAN shell you can simply type "install WebFetch" and it will download, build and install it for you.

If you need help setting up a separate area to install the modules (i.e. if you don't have write permission where perl keeps its modules) then see the Perl FAQ.

To begin using the WebFetch modules, you will need to test your fetch operations manually, put them into a crontab, and then use server-side include (SSI) or a similar server configuration to include the files in a live web page.

MANUALLY TESTING A FETCH OPERATION

Select a directory which will be the storage area for files created by WebFetch. This is an important administrative decision - keep the volatile automatically-generated files in their own directory so they'll be separated from manually-maintained files.

Choose the specific WebFetch-derived modules that do the work you want. See their particular manual/web pages for details on command-line arguments. Test run them first before committing to a crontab.

SETTING UP CRONTAB ENTRIES

If needed, see the manual pages for crontab(1), crontab(5) and any web sites or books on Unix system administration.

Since WebFetch command lines are usually very long, the user may prefer to make one or more scripts as front-ends so crontab entries aren't so big.

Try not to run crontab entries too often - be aware if the site you're accessing has any resource constraints, and how often their information gets updated. If they request users not to access a feed more often than a certain interval, respect it. (It isn't hard to find violators in server logs.) If in doubt, try every 30 minutes until more information becomes available.

WebFetch FUNCTIONS

The following function definitions assume $obj is a blessed reference to a module that is derived from (inherits from) WebFetch.

WRITING WebFetch-DERIVED MODULES

The easiest way to make a new WebFetch-derived module is to start from the module closest to your fetch operation and modify it. Make sure to change all of the following:

Please consider contributing any useful changes back to the WebFetch project at maint@webfetch.org.

ACKNOWLEDGEMENTS

WebFetch was written by Ian Kluft Send patches, bug reports, suggestions and questions to maint@webfetch.org.

Some changes in versions 0.12-0.13 (Aug-Sep 2009) were made for and sponsored by Twiki Inc (formerly TWiki.Net).

LICENSE

WebFetch is Open Source software licensed under the GNU General Public License Version 3. See https://www.gnu.org/licenses/gpl-3.0-standalone.html.

SEE ALSO

WebFetch::Input::PerlStruct.html"WebFetch::Input::PerlStruct>, WebFetch::Input::SiteNews.html"WebFetch::Input::SiteNews>, WebFetch::Input::Atom.html"WebFetch::Input::Atom>, WebFetch::RSS.html"WebFetch::RSS>, WebFetch::Input::Dump.html"WebFetch::Input::Dump>, WebFetch::Output::TT.html"WebFetch::Output::TT>, WebFetch::Output::Dump.html"WebFetch::Output::Dump>, https://github.com/ikluft/WebFetch, http://www.perl.org/