NAME
HTML::Summary - module for generating a summary from a web page.
SYNOPSIS
use HTML::Summary;
use HTML::TreeBuilder;
$tree = new HTML::TreeBuilder;
$tree->parse( $document );
my $summarizer = new HTML::Summary(
LENGTH => 200,
USE_META => 1,
);
$summary = $summarizer->generate( $tree );
$summarizer->option( 'USE_META' => 1 );
$length = $summarizer->option( 'LENGTH' );
if ( $summarizer->meta_used( ) )
{
do something
}
DESCRIPTION
The HTML::Summary
module produces summares from the textual content of web pages. It does so using the location heuristic, which determines the value of a given sentence based on its position and status within the document. For example, headings, section titles and opening paragraph sentences may be favoured over other textual content. The sentences are scored, sorted, formatted and output (in the original order) until the desired summary length is reached.
METHODS
- $summarizer = HTML::Summary->new( $attr => $value );
-
Constructor. Possible attributes are:
- VERBOSE
-
Generate verbose messages to STDERR.
- LENGTH
-
Maximum length of summary (in bytes). Default is 500.
- USE_META
-
Flag to tell summarizer whether to use the content of the >META< tag in the page header, if one is present, instead of generating a summary from the body text. Note that if the USE_META flag is set, this overrides the LENGTH flag - in other words, the summary provided by the >META< tag is returned in full, even if it is greater than LENGTH bytes. Default is 0 (no).
- $summary = $summarizer->generate( $tree );
-
Takes an HTML::Element object, and generates a summary from it.
- $summary = $summarizer->option( );
-
Get / set HTML::Summary configuration options.
- $summary = $summarizer->meta_used( );
-
Returns 1 if the META tag description was used to generate the summary.
SEE ALSO
HTML::TreeBuilder, Text::Sentence, Lingua::JA::Jcode, Lingua::JA::Jtruncate
AUTHORS
Neil Bowers <neilb@cre.canon.co.uk>, and Tony Rose <tgr@cre.canon.co.uk>, Ave Wrigley <wrigley@cre.canon.co.uk>
COPYRIGHT
Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights reserved. This script and any associated documentation or files cannot be distributed outside of CRE without express prior permission from CRE.