NAME
DataFlow::Node::HTMLFilter - A HTML filtering node
VERSION
version 0.91.05
SYNOPSIS
use DataFlow::Node::HTMLFilter;
my $filter_html = DataFlow::Node::HTMLFilter->new(
search_xpath => '//td',
result_type => 'HTML',
);
my $filter_value = DataFlow::Node::HTMLFilter->new(
search_xpath => '//td',
result_type => 'VALUE',
);
my $input = <<EOM;
<html><body>
<table>
<tr><td>Line 1</td><td>L1, Column 2</td>
<tr><td>Line 2</td><td>L2, Column 2</td>
</table>
</html></body>
EOM
$filter_html->input( $input );
# @result == '<td>Line 1</td>', ... '<td>L2, Column 2</td>'
$filter_value->input( $input );
# @result == q{Line 1}, ... q{L2, Column 2}
DESCRIPTION
This node type provides a filter for HTML content. Each item will be considered as a HTML content and will be filtered using HTML::TreeBuilder::XPath.
NAME
DataFlow::Node::HTMLFilter - A filter node for HTML content.
ATTRIBUTES
search_xpath
This attribute is a XPath string used to filter down the HTML content. The search_xpath
attribute is mandatory.
result_type
This attribute is a string, but its value must be one of: HTML
, VALUE
, NODE
. The default is HTML
.
HTML
The result will be the HTML content specified by search_xpath
.
VALUE
The result will be the literal value enclosed by the tag and/or attribute specified by search_xpath
.
NODE
The result will be a list of HTML::Element objects, as returned by the findnodes
method of HTML::TreeBuilder::XPath class.
Most people will probably use HTML
or VALUE
, but this option is also provided in case someone wants to manipulate the HTML elements directly.
ref_result
This attribute is a boolean, and it signals whether the result list should be added as a list of items to the output queue, or as a reference to an array of items. The default is 0 (false).
There is a semantic subtlety here: if ref_result
is 1 (true), then one HTML item (input) may generate one or zero ArrayRef item (output), i.e. it is a one-to-one mapping. On the other hand, by keeping ref_result
as 0 (false), one HTML item may produce any number of items as result, i.e. it is a one-to-many mapping.
METHODS
The interface for DataFlow::Node::HTMLFilter
is the same of DataFlow::Node
, plus the accessor methods for the attributes described above.
DEPENDENCIES
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
Please report any bugs or feature requests to bug-dataflow@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
AUTHOR
Alexei Znamensky <russoz@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by Alexei Znamensky.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.