NAME

DataFlow::Node::HTMLFilter - A HTML filtering node

VERSION

version 0.91.05

SYNOPSIS

    use DataFlow::Node::HTMLFilter;

    my $filter_html = DataFlow::Node::HTMLFilter->new(
        search_xpath => '//td',
    	result_type  => 'HTML',
	);

    my $filter_value = DataFlow::Node::HTMLFilter->new(
        search_xpath => '//td',
    	result_type  => 'VALUE',
	);

    my $input = <<EOM;
    <html><body>
      <table>
        <tr><td>Line 1</td><td>L1, Column 2</td>
        <tr><td>Line 2</td><td>L2, Column 2</td>
      </table>
    </html></body>
    EOM

    $filter_html->input( $input );
    # @result == '<td>Line 1</td>', ... '<td>L2, Column 2</td>'

    $filter_value->input( $input );
    # @result == q{Line 1}, ... q{L2, Column 2}

DESCRIPTION

This node type provides a filter for HTML content. Each item will be considered as a HTML content and will be filtered using HTML::TreeBuilder::XPath.

NAME

DataFlow::Node::HTMLFilter - A filter node for HTML content.

ATTRIBUTES

search_xpath

This attribute is a XPath string used to filter down the HTML content. The search_xpath attribute is mandatory.

result_type

This attribute is a string, but its value must be one of: HTML, VALUE, NODE. The default is HTML.

HTML

The result will be the HTML content specified by search_xpath.

VALUE

The result will be the literal value enclosed by the tag and/or attribute specified by search_xpath.

NODE

The result will be a list of HTML::Element objects, as returned by the findnodes method of HTML::TreeBuilder::XPath class.

Most people will probably use HTML or VALUE, but this option is also provided in case someone wants to manipulate the HTML elements directly.

ref_result

This attribute is a boolean, and it signals whether the result list should be added as a list of items to the output queue, or as a reference to an array of items. The default is 0 (false).

There is a semantic subtlety here: if ref_result is 1 (true), then one HTML item (input) may generate one or zero ArrayRef item (output), i.e. it is a one-to-one mapping. On the other hand, by keeping ref_result as 0 (false), one HTML item may produce any number of items as result, i.e. it is a one-to-many mapping.

METHODS

The interface for DataFlow::Node::HTMLFilter is the same of DataFlow::Node, plus the accessor methods for the attributes described above.

DEPENDENCIES

DataFlow::Node

HTML::TreeBuilder::XPath

HTML::Element

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

Please report any bugs or feature requests to bug-dataflow@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Alexei Znamensky <russoz@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2011 by Alexei Znamensky.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.