NAME
Image::OCR::Tesseract - read an image with tesseract and get output
SYNOPSIS
use Image::OCR::Tesseract 'get_ocr';
my $image = './hi.jpg';
my $text = get_ocr($image);
DESCRIPTION
This is a simple wrapper for tesseract. Tesseract expects a tiff file, get_ocr() will convert to a temporary tiff if your file is not a tiff file, that way you don't have to worry about your image format for ocr.
Tesseract spits out a text file- get_ocr() will erase that and return you the output.
get_ocr()
Argument is abs path to image file. Optional argument is abs path to temp dir. If you don't have write access to the directory the image resides on, you should provide as argument a directory you do have write access to.
Returns text content as read by tesseract.
Does not clean up after itself if DEBUG is on.
warns if no output
_tesseract()
Argument is abs path to tif file. Will return text output. If none inside or tesseract fails, returns empty string. If tesseract fails, warns.
TESSERACT NOTES
tesseract is an open source ocr engine. for an image to be read by tesseract properly, it must be an 8 bit per pixel tif format image file. What this module does is to create a temporary file from your target image, which will be an 8 bit per pixel image.
INSTALLING TESSERACT
Included in this package is t/tesseract_install_helper.pl which will check for packages needed.
Installing tesseract can be tricky. You will basically need gcc-c++ and automake installed on your system.
After you have automake and gcc-c++, you should be able to install.
SVN
You may be able to simply install the SVN version of Tesseract by using:
svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
./runautoconf
mkdir build-directory
cd build-directory
../configure
make
make install
for more see google project on ocr, they use tesseract
GOCR
Another great OCR engine is gocr, but it is not suited for the purpose of reading text from images. gocr is great if you need to tweak what you are reading, and for other specialized purposes.
SEE ALSO
tesseract gocr convert ocr
DEBUG
Set the debug flag on:
$Image::OCR::Tesseract::DEBUG = 1;
A temporary file is created, if DEBUG is on, the file is not deleted, the file path is printed to STDERR.
AUTHOR
Leo Charre leocharre at cpan dot org
COPYRIGHT
Copyright (c) 2007 Leo Charre. All rights reserved.
LICENSE
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the "Artistic License" or the "GNU General Public License".
DISCLAIMER
This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the "GNU General Public License" for more details.