NAME
HTML::Feature - an extractor of feature sentence from HTML
SYNOPSIS
use strict;
use HTML::Feature;
my $f = HTML::Feature->new(
enc_type => 'utf-8',
ret_num => 10,
max_bytes => 5000,
min_bytes => 1
);
my $data = $f->extract( url => 'http://www.perl.com' );
# print result data
print $data->{title}, "\n";
print $data->{description}, "\n";
for(@{$data->{block}}){
print $_->{score}, "\n";
print $_->{contents}, "\n";
}
DESCRIPTION
This module extracts some feature blocks from an HTML document. I do not adopt general technique such as "morphological analysis" in this module. By simpler statistics processing, this module will extract a feature blocks. So, it may be able to apply it in a language of any country easily.
METHODS
- new([options])
-
a object is made by using the options.
- extract(url => $url | string => $string)
-
return feature blocks (references) with TITLE and DESCRIPTION.
OPTIONS
# it is possible to set value to the constructor
my $f = HTML::Feature->new(
$self->{ret_num} = 1;
# number of return blocks (default is '1').
$self->{max_bytes} = '5000';
# The upper limit number of bytes of a node to analyze (default is '').
$self->{min_bytes} = '10';
# The bottom limit number (default is '').
$self->{enc_type} = 'euc-jp';
# An arbitrary character code, If there is not appointment in particular, I become the character code which an UTF-8 flag is with (default is '').
);
SEE ALSO
HTML::TreeBuilder,Statistics::Lite,Encode::Detect
AUTHOR
Takeshi Miki <miki@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2007 Takeshi Miki
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 250:
You forgot a '=back' before '=head1'