NAME
WebFetch Embedding API - how to embed WebFetch into other programs
DESCRIPTION
As of version 0.10 of WebFetch, the Embedding API (application programming interface) was added. WebFetch used to be mostly for use for saving captured web content to files. The purpose of the Embedding API is to allow any Perl program to run a WebFetch module and capture its output for use within that program.
Once the information is available to other programs, there are as many possibilities as Perl allows for what to do with it. So the Embedding API delivers on the implied promise that came from packaging WebFetch as Perl5 modules instead of scripts. Instead of just saving the information to files, now it's available within a program too.
Conversion of WebFetch 0.09 Modules to the Embedding API
Old WebFetch 0.09 and earlier modules continue to work in what is called "0.09 compatibility mode". Actually, the old strucutres are still there so it's more than just compatibility - the new Embedding API is triggered by the presence of new variables which contain more information.
The following modules were converted to the new API in WebFetch 0.10: CNETnews, CNNsearch, COLA, Freshmeat, SiteNews, and Slashdot.
The following modules were converted to the new API in WebFetch 0.11: DebianNews General.
The remaining modules have not been converted yet and operate in 0.09 compatibility mode: DebianNews, General, LinuxDevNet, LinuxTelephony, ListSubs, PerlStruct, 32BitsOnline and YahooBiz. Upcoming WebFetch releases will convert more until all the modules are done.
Modules that users have written for WebFetch 0.09 and earlier which are not (yet) distributed with WebFetch should also continue to work unmodified. If they don't, that would be a bug and should be reported.
How the Embedding API Works
In WebFetch 0.09 and earlier, each module was responsible for saving its own information to a file, for all file formats that they supported. Now the module saves to some predefined variables. If the module was called from WebFetch's core command-line routines, the core will handle saving the files in all requested formats. If the module was called from another program, now it has the option to look inside the returned data and do its own processing with it.
A WebFetch module which implements the Embedding API must define the following variables, which are defined in more detail in the WebFetch core module documentation:
- $obj->{data}
-
a hash reference which contains more data-related variables
- $obj->{actions}
-
a hash reference which defines things that the WebFetch core must do with the data (i.e. where and how to save it.) Entry may have hash keys like "html", "xml", "wf" (WebFetch native format), and "rdf" (XML Resource Definition Format) or other names if the proper handler functions are defined. (Your module can define its own handler function and then use it to output your information. See WebFetch::SiteNews for an example.)
- $obj->{data}{fields}
-
an array reference which contains a list of field names in order. This is part of the definition of the data captured and returned from the module.
- $obj->{data}{wk_names}
-
a hash reference which uses hash keys of "well known names", field names which WebFetch can attribute specific functions to (like title, url, etc.) and hash values of the names of the fields defined in $obj->{data}{fields} that contain that data in this data set.
- $obj->{data}{records}
-
an array reference containing more array references - a two-dimensional table of records containing fields of data. Each record is one unit/entry of information captured. Each field within a record corresponds by position within the array to the names in $obj->{data}{fields}.
Programs which need to use this data can read it directly from the records and fields in $obj->{data}{records}. Some sample code is shown in the WebFetch FAQ at http://www.webfetch.org/
A module which inherits from WebFetch and fails to provide these variables is assumed to be a WebFetch 0.09 module. Any module in WebFetch 0.09 compatibiltily mode is responsible to save its own HTML or other files.