NAME
Text::Extract::Word - Extract text from Word files
SYNOPSIS
use Text::Extract::Word qw(get_all_text);
my $text = get_all_text("test1.doc");
DESCRIPTION
This simple module allows the textual contents to be extracted from a Word file. The code was ported from Java code, originally part of the Apache POE project, but extensive code changes were made interanlly.
FUNCTIONS
get_all_text($filename)
The only function exported by this module, when called on a file name, returns the text contents of the Word file. The contents are returned as UTF-8 encoded text.
BUGS
support for legacy Word - the module does not extract text from Word version 6 or earlier
SEE ALSO
OLE::Storage also has a script lhalw
(Let's Have a Look at Word) which extracts text from Word files. This is simply a much smaller module with lighter dependencies, using OLE::Storage_Lite for its storage management.
AUTHOR
Stuart Watt, stuart@morungos.com
COPYRIGHT
Copyright (c) 2010 Stuart Watt. All rights reserved.