The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::Feature - an extractor of feature sentence from HTML

SYNOPSIS

    use strict;
    use HTML::Feature;

    my $f = HTML::Feature->new(ret_num => 10);
    my $data = $f->extract( url => 'http://www.perl.com' );

    # print result data

    print $data->{title}, "\n";
    print $data->{description}, "\n";

    for(@{$data->{block}}){
        print $_->{score}, "\n";
        print $_->{contents}, "\n";
    }

DESCRIPTION

This module extracts some feature blocks from an HTML document. I do not adopt general technique such as "morphological analysis" in this module. By simpler statistics processing, this module will extract a feature blocks. So, it may be able to apply it in a language of any country easily.

METHODS

new([options])

a object is made by using the options.

extract(url => $url | string => $string)

return feature blocks (references) with TITLE and DESCRIPTION.

OPTIONS

    # it is possible to set value to the constructor
    my $f = HTML::Feature->new(
        ret_num => 1, 
        # number of return blocks (default is '1').
        max_bytes => 5000,
        # The upper limit number of bytes of a node to analyze (default is '').
        min_bytes => 10, 
        # The bottom limit number (default is '').
        enc_type => 'euc-jp', 
        # An arbitrary character code, If there is not appointment in particular, I become the character code which an UTF-8 flag is with (default is '').
        look_fine => 1; 
        # return data as "look fine" (default is ''). 
   );

SEE ALSO

HTML::TreeBuilder,Statistics::Lite,Encode::Guess

AUTHOR

Takeshi Miki <miki@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 Takeshi Miki

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 289:

You forgot a '=back' before '=head1'