NAME

WebSource - giving machine access to the Web

DESCRIPTION

WebSource gives a general and normalized framework way to access data made available via the web. An access to subparts of the Web is made by defining a task. This task is built by composing query building, extraction, fetching and filtering subtasks.

SYNOPSIS

$source = WebSource->new(wsd => $description);
@results = $source->query($query);
or
$result = $source->set_query($query);
while($result = $source->next_result()) {
  ...
}

ABSTRACT

WebSource originally was a generic wrapper around a Web Source. Given an XML description of a source it allows to query the source and retreive its results. The format of the query and the result remain source dependant however.

It is now configurable enough allow to do complex tasks on the web : such as fetching, extracting, filtering data one the Web. Each complex task is described by an XML task description file (WebSource description). This task is decomposed into simple subtasks of different flavors.

Existing subtask flavors are : - extract input an XML::LibXML::Document output an XML::LibXML::Node Applys an Xpath on the document and returns the set of nodes - fetch input a URL (or XML::LibXML::Node containing a url) output an XML::LibXML::Document - format input an XML::Document output a string - filter input anything output anything (but not all) - external This type of subtask uses an external perl module as a task. This allows to define highly configurable tasks. input depends on external module output depends on external module

METHODS

$source = WebSource->new(wsd => $wsd);

Create a new WebSource object working with the given a WebSource description

The following named paramters can be given :

wsd

Use a generic engine with the given source description file

max_results

Do not output more than max_results

$source->push($item);

Pass the initial data to the first subtask

$source->query($query);

Build a query %hash for the given parameters and push it in

$source->set_max_results($count);

Set the maximum number of results to output to $count

$source->next_result();

Returns the following result for the task

$source->parameters;

Returns a has of the initial tasks parameters

$source->option_spec;

Returns the spec of the options translated for Getopt::Mixed

$source->set_option($opt,$val)

Sets source specific option $opt to value $val

$source->apply_options

Handles node of type <ws:attribute name="aname" value="oname" /> by adding and attribut name aname with the value of the option named oname to the parent node. The ws:attribute node is then removed.

SEE ALSO

ws-query, WebSource::Extract, WebSource::Fetch, WebSource::Filter, etc.

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 108:

=back doesn't take any parameters, but you said =back 2

Around line 324:

=back doesn't take any parameters, but you said =back 2

Around line 326:

'=item' outside of any '=over'

Around line 435:

You forgot a '=back' before '=head1'