NAME
WebSource - giving machine access to the Web
DESCRIPTION
WebSource gives a general and normalized framework way to access data made available via the web. An access to subparts of the Web is made by defining a task. This task is built by composing query building, extraction, fetching and filtering subtasks.
SYNOPSIS
$source = WebSource->new(wsd => $description);
@results = $source->query($query);
or
$result = $source->set_query($query);
while($result = $source->next_result()) {
...
}
ABSTRACT
WebSource originally was a generic wrapper around a Web Source. Given an XML description of a source it allows to query the source and retreive its results. The format of the query and the result remain source dependant however.
It is now configurable enough allow to do complex tasks on the web : such as fetching, extracting, filtering data one the Web. Each complex task is described by an XML task description file (WebSource description). This task is decomposed into simple subtasks of different flavors.
Existing subtask flavors are : - extract input an XML::LibXML::Document output an XML::LibXML::Node Applys an Xpath on the document and returns the set of nodes - fetch input a URL (or XML::LibXML::Node containing a url) output an XML::LibXML::Document - format input an XML::Document output a string - filter input anything output anything (but not all) - external This type of subtask uses an external perl module as a task. This allows to define highly configurable tasks. input depends on external module output depends on external module
METHODS
- $source = WebSource->new(wsd => $wsd);
-
Create a new WebSource object working with the given a WebSource description
The following named paramters can be given :
wsd
-
Use a generic engine with the given source description file
max_results
-
Do not output more than max_results
- $source->push($item);
-
Pass the initial data to the first subtask
- $source->query($query);
-
Build a query %hash for the given parameters and push it in
- $source->set_max_results($count);
-
Set the maximum number of results to output to $count
- $source->next_result();
-
Returns the following result for the task
- $source->parameters;
-
Returns a has of the initial tasks parameters
- $source->option_spec;
-
Returns the spec of the options translated for Getopt::Mixed
- $source->set_option($opt,$val)
-
Sets source specific option $opt to value $val
- $source->apply_options
-
Handles node of type <ws:attribute name="aname" value="oname" /> by adding and attribut name aname with the value of the option named oname to the parent node. The ws:attribute node is then removed.
SEE ALSO
ws-query, WebSource::Extract, WebSource::Fetch, WebSource::Filter, etc.
4 POD Errors
The following errors were encountered while parsing the POD:
- Around line 108:
=back doesn't take any parameters, but you said =back 2
- Around line 324:
=back doesn't take any parameters, but you said =back 2
- Around line 326:
'=item' outside of any '=over'
- Around line 435:
You forgot a '=back' before '=head1'