NAME

HTML::Feature - an extractor of feature sentence from HTML

SYNOPSIS

use strict;
use HTML::Feature;

my $f = HTML::Feature->new(ret_num => 10);
my $data = $f->extract( url => 'http://www.perl.com' );

# print result data

print $data->{title}, "\n";
print $data->{description}, "\n";

for(@{$data->{block}}){
    print $_->{score}, "\n";
    print $_->{contents}, "\n";
}

DESCRIPTION

This module extracts some feature blocks from an HTML document. I do not adopt general technique such as "morphological analysis" in this module. By simpler statistics processing, this module will extract a feature blocks. So, it may be able to apply it in a language of any country easily.

METHODS

new([options])

a object is made by using the options.

extract(url => $url | string => $string)

return feature blocks (references) with TITLE and DESCRIPTION.

OPTIONS

    # it is possible to set value to the constructor
    my $f = HTML::Feature->new(
	ret_num => 1, 
	# number of return blocks (default is '1').
	max_bytes => 5000,
	# The upper limit number of bytes of a node to analyze (default is '').
	min_bytes => 10, 
	# The bottom limit number (default is '').
	enc_type => 'euc-jp', 
	# An arbitrary character code, If there is not appointment in particular, I become the character code which an UTF-8 flag is with (default is '').
	look_fine => 1; 
	# return data as "look fine" (default is ''). 
   );

SEE ALSO

HTML::TreeBuilder,Statistics::Lite,Encode::Guess

AUTHOR

Takeshi Miki <miki@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 Takeshi Miki

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 289:

You forgot a '=back' before '=head1'