NAME
extract_reuters.plx - parse Reuters 21578 corpus into individual files
SYNOPSIS
./extract_reuters.plx /path/to/expanded/reuters/archive
DESCRIPTION
This script will extract TITLE and BODY for each item in the Reuters 21578 corpus into individual files. It expects to be passed the location of the decompressed archive as a command line argument.
AUTHOR
Marvin Humphrey < marvin at rectangular dot com >.
COPYRIGHT AND LICENSE
Copyright 2006 Marvin Humphrey
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.