NAME
DataFlow::Proc::HTMLFilter - A HTML filtering processor
VERSION
version 1.112100
SYNOPSIS
use DataFlow::Proc::HTMLFilter;
my $filter_html = DataFlow::Proc::HTMLFilter->new(
search_xpath => '//td',
result_type => 'HTML',
);
my $filter_value = DataFlow::Proc::HTMLFilter->new(
search_xpath => '//td',
result_type => 'VALUE',
);
my $input = <<EOM;
<html><body>
<table>
<tr><td>Line 1</td><td>L1, Column 2</td>
<tr><td>Line 2</td><td>L2, Column 2</td>
</table>
</html></body>
EOM
$filter_html->process( $input );
# @result == '<td>Line 1</td>', ... '<td>L2, Column 2</td>'
$filter_value->process( $input );
# @result == q{Line 1}, ... q{L2, Column 2}
DESCRIPTION
This processor type provides a filter for HTML content. Each item will be considered as a HTML content and will be filtered using HTML::TreeBuilder::XPath.
ATTRIBUTES
search_xpath
This attribute is a XPath string used to filter down the HTML content. The search_xpath
attribute is mandatory.
result_type
This attribute is a string, but its value must be one of: HTML
, VALUE
, NODE
. The default is HTML
.
HTML
The result will be the HTML content specified by
search_xpath
.VALUE
The result will be the literal value enclosed by the tag and/or attribute specified by
search_xpath
.NODE
The result will be a list of HTML::Element objects, as returned by the
findnodes
method of HTML::TreeBuilder::XPath class.
Most people will probably use HTML
or VALUE
, but this option is also provided in case someone wants to manipulate the HTML elements directly.
ref_result
This attribute is a boolean, and it signals whether the result list should be added as a list of items to the output queue, or as a reference to an array of items. The default is 0 (false).
There is a semantic subtlety here: if ref_result
is 1 (true), then one HTML item (input) may generate one or zero ArrayRef item (output), i.e. it is a one-to-one mapping. On the other hand, by keeping ref_result
as 0 (false), one HTML item may produce any number of items as result, i.e. it is a one-to-many mapping.
SUPPORT
Perldoc
You can find documentation for this module with the perldoc command.
perldoc DataFlow::Proc::HTMLFilter
Websites
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
Search CPAN
The default CPAN search engine, useful to view POD in HTML format.
AnnoCPAN
The AnnoCPAN is a website that allows community annonations of Perl module documentation.
CPAN Ratings
The CPAN Ratings is a website that allows community ratings and reviews of Perl modules.
CPAN Forum
The CPAN Forum is a web forum for discussing Perl modules.
CPANTS
The CPANTS is a website that analyzes the Kwalitee ( code metrics ) of a distribution.
http://cpants.perl.org/dist/overview/DataFlow-Proc-HTMLFilter
CPAN Testers
The CPAN Testers is a network of smokers who run automated tests on uploaded CPAN distributions.
http://www.cpantesters.org/distro/D/DataFlow-Proc-HTMLFilter
CPAN Testers Matrix
The CPAN Testers Matrix is a website that provides a visual way to determine what Perls/platforms PASSed for a distribution.
http://matrix.cpantesters.org/?dist=DataFlow-Proc-HTMLFilter
You can email the author of this module at RUSSOZ at cpan.org
asking for help with any problems you have.
Internet Relay Chat
You can get live help by using IRC ( Internet Relay Chat ). If you don't know what IRC is, please read this excellent guide: http://en.wikipedia.org/wiki/Internet_Relay_Chat. Please be courteous and patient when talking to us, as we might be busy or sleeping! You can join those networks/channels and get help:
irc.perl.org
You can connect to the server at 'irc.perl.org' and join this channel: #sao-paulo.pm then talk to this person for help: russoz.
Bugs / Feature Requests
Please report any bugs or feature requests by email to bug-dataflow-proc-htmlfilter at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DataFlow-Proc-HTMLFilter. You will be automatically notified of any progress on the request by the system.
Source Code
The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)
https://github.com/russoz/DataFlow-Proc-HTMLFilter
git clone https://github.com/russoz/DataFlow-Proc-HTMLFilter
AUTHOR
Alexei Znamensky <russoz@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by Alexei Znamensky.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
BUGS AND LIMITATIONS
No bugs have been reported.
Please report any bugs or feature requests through the web interface at http://rt.cpan.org.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.