NAME
Lingua::ZH::Summarize - Summarizing bodies of Chinese text
SYNOPSIS
use Lingua::ZH::Summarize;
print summarize( $text ); # Easy, no? :-)
print summarize( $text, maxlength => 500 ); # 500-byte summary
print summarize( $text, wrap => 75 ); # Wrap output to 75 col.
DESCRIPTION
This is a simple module which makes an unscientific effort at summarizing Chinese text. It recognizes simple patterns which look like statements, abridges them, and concatenates them into something vaguely resembling a summary. It needs more work on large bodies of text, but it seems to have a decent effect on small inputs at the moment.
Lingua::ZH::Summarize exports one function, summarize()
, which takes the text to summarize as its first argument, and any number of optional directives in name => value
form. The options it'll take are:
- maxlength
-
Specifies the maximum length, in bytes, of the generated summary.
- wrap
-
Prettyprints the summary output by wrapping it to the number of columns which you specify. This requires the Lingua::ZH::Wrap module.
Needless to say, this is a very simple and not terribly universally effective scheme, but it's good enough for a first draft, and I'll bang on it more later. Like I said, it's not a scientific approach to the problem, but it's better than nothing.
SEE ALSO
Lingua::ZH::Toke, Lingua::ZH::Wrap, Lingua::EN::Summarize
ACKNOWLEDGEMENTS
Algorithm adapted from the Lingua::EN::Summarize module by Dennis Taylor, <dennis@funkplanet.com>.
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
COPYRIGHT
Copyright 2003 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.