NAME
Text::Identify::BoilerPlate - Remove repeated text
VERSION
Version 0.1
SYNOPSIS
Finds boilerplate text (lines that are repeated across documents) in a list of plain text files. Only sets consecutive lines of repeated text at the start and end of documents are considered boilerplate text.
use Text::Identify::BoilerPlate;
my @files = ('file1', 'file2', 'file3');
rem_boilerplate(@files);
New files are written, containing everything but the boilerplate text.
EXPORT
FUNCTIONS
rem_boilerplate
AUTHOR
Lars Nygaard, <lars.nygaard@inl.uio.no>
BUGS
The program should be bug-free, but is still needs extensive testing and tweaking before the simple algorithm can give consistently high-quality results.
Please report any bugs or feature requests to bug-text-identify-boilerplate@rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Identify-BoilerPlate. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
ACKNOWLEDGEMENTS
COPYRIGHT & LICENSE
Copyright 2005 Lars Nygaard, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.