NAME

HTML::Feature - an extractor of feature sentence from HTML

SYNOPSIS

    use strict;
    use HTML::Feature;

    my $f = HTML::Feature->new(
	    ret_num => 10 # default is '1'
    );
    my $data = $f->extract( url => 'http://www.perl.com' );

    # print result data

    print $data->{title}, "\n";
    print $data->{description}, "\n";

    for(@{$data->{block}}){
        print $_->{score}, "\n";
        print $_->{contents}, "\n";
    }

DESCRIPTION

This module extracts some feature blocks from an HTML document. I do not adopt general technique such as "morphological analysis" in this module. By simpler statistics processing, this module will extract a feature blocks. So, it may be able to apply it in a language of any country easily.

METHODS

new([options])

a object is made by using the options.

extract(url => $url | string => $string)

return feature blocks (references) with TITLE and DESCRIPTION.

OPTIONS

    # it is possible to set value to the constructor
    my $f = HTML::Feature->new(
        
	$self->{ret_num} = 1; 
	# number of return blocks (default is '1').
        
	$self->{max_bytes} = '5000'; 
	# The upper limit number of bytes of a node to analyze (default is '').
        
	$self->{min_bytes} = '10'; 
	# The bottom limit number (default is '').
        
	$self->{enc_type} = 'euc-jp'; 
	# An arbitrary character code, If there is not appointment in particular, I become the character code which an UTF-8 flag is with (default is '').
	
    $self->{look_fine} = '1'; 
	# return data as "look fine" (default is ''). 
   );

SEE ALSO

HTML::TreeBuilder,Statistics::Lite,Encode::Guess

AUTHOR

Takeshi Miki <miki@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 Takeshi Miki

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 288:

You forgot a '=back' before '=head1'