There is an ongoing outage on the primary CPAN mirror. It is possible to work around the issue by using MetaCPAN as a mirror.

NAME

extract_reuters.plx - parse Reuters 21578 corpus into individual files

SYNOPSIS

./extract_reuters.plx /path/to/expanded/reuters/archive

DESCRIPTION

This script will extract TITLE and BODY for each item in the Reuters 21578 corpus into individual files. It expects to be passed the location of the decompressed archive as a command line argument.

AUTHOR

Marvin Humphrey < marvin at rectangular dot com >.

COPYRIGHT AND LICENSE

Copyright 2006 Marvin Humphrey

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.