NAME

SWISH::Filters::Doc2txt - Perl extension for filtering MSWord documents with Swish-e

DESCRIPTION

This is a plug-in module that uses the "catdoc" program to convert MS Word documents to text for indexing by Swish-e. "catdoc" can be downloaded from:

http://www.ice.ru/~vitus/catdoc/ver-0.9.html

The program "catdoc" must be installed and your PATH before running Swish-e.

BUGS

This filter does not specify input or output character encodings. This will change in the future to all use of the user_data to set the encoding.

A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file.

AUTHOR

Bill Moseley

SEE ALSO

SWISH::Filter