NAME
HTML::Feature - an extractor of feature sentence from HTML
SYNOPSIS
use strict;
use HTML::Feature;
use Data::Dumper;
use Encode;
binmode STDOUT, ':encoding(utf-8)';
my $f = HTML::Feature->new;
my $data = $f->extract( url => 'http://www.perl.com' );
# print result data
my $boundary = "-" x 40;
print "\n";
print $boundary, "\n";
print "* TITLE:\n";
print $boundary, "\n";
print $data->{title}, "\n";
print "\n";
print $boundary, "\n";
print "* DESCRIPTION:\n";
print $boundary, "\n";
print $data->{description}, "\n";
my $i = 0;
for(@{$data->{block}}){
$i++;
print "\n";
print $boundary, "\n";
print "* CONTENTS-$i:\n";
print $boundary, "\n";
print $_->{contents}, "\n";
print $boundary, "\n";
print "SCORE:",$_->{score},"\n";
}
# print more details
prin Dumper($data);
DESCRIPTION
This module extracts some feature sentence from HTML.
First, HTML document is divided into plural blocks by a certain boundary line.
And each blocking is evaluated individually.
Evaluation of each block is decided by document size (the number of bytes) and a coefficient of a tag.
Being optional, arbitrary value can set a coefficient of a tag.
By the way, this module is not designed to extract a feature sentence from a page such as a list of links(for example, top pages of portal site).
It may extract well a feature sentence from a page with quantity of some document, (for example, news peg or blog) .
METHODS
- new([options])
-
a object is made by using the options.
- extract(url => $url | string => $string)
-
return feature blocks with TITLE and DESCRIPTION.
OPTIONS
# it is possible to transfer default value to the constructor
my $f = HTML::Feature->new(
# set the factor every of tag
tag_score => {
a => 0.90,
b => 1.05,
strong => 1.05,
h1 => 1.2,
h2 => 1.1,
h3 => 1.05
},
# set return number
ret_num => 3,
# set Corresponding character code
suspects_enc => ['euc-jp', 'shiftjis', '7bit-jis']
);
SEE ALSO
HTML::TokeParser,HTML::Entites,Encode::Guess
AUTHOR
Takeshi Miki <miki@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2007 Takeshi Miki
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 278:
You forgot a '=back' before '=head1'