NAME

Image::OCR::Tesseract - read an image with tesseract and get output

SYNOPSIS

use Image::OCR::Tesseract 'get_ocr';

my $image = './hi.jpg';

my $text = get_ocr($image);

DESCRIPTION

This is a simple wrapper for tesseract.

Tesseract expects a tiff file, get_ocr() will convert to a temporary tiff if your file is not a tiff file that way you don't have to worry about your image format for ocr.

Tesseract spits out a text file- get_ocr() will erase that and return you the output.

This is part of the PDF::OCR package.

get_ocr()

Argument is abs path to image file. Optional argument is abs path to temp dir, (if you can't write to /tmp) default is /tmp Returns text content as read by tesseract. Does not clean up after itself if DEBUG is on

warns if no output

_tesseract()

Argument is abs path to tif file. Will return text output. If none inside or tesseract fails, returns empty string. If tesseract fails, warns.

SEE ALSO

tesseract

gocr

DEBUG

Set the debug flag on:

$Image::OCR::Tesseract::DEBUG = 1;

A temporary file is created, if DEBUG is on, the file is not deleted, the file path is printed to STDERR.

AUTHOR

Leo Charre leocharre at cpan dot org

COPYRIGHT

Copyright (c) 2007 Leo Charre. All rights reserved.

LICENSE

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the "Artistic License" or the "GNU General Public License".

DISCLAIMER

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the "GNU General Public License" for more details.