NAME

PDF::OCR2::Page

DESCRIPTION

Extract a pdf page document's text, from inside the document and if there are images, from the images via tesseract ocr.

METHODS

new()

Arg is hashref. Must have abs_pdf to pdf file. If no abs_pdf is provided or it does not exist on disk, throws exception.

abs_pdf()

Argument is path to pdf representing one page. Must be on disk. Perl setget method.

abs_images()

Returns aref of images, returns list in list context. Uses PDF::GetImages, slow.

CLASS VARIABLES

Defaults shown.

Eval pdf with PDF::API2 for correctness/etc.

$PDF::OCR2::Page::CHECK_PDF = 0;

Do not clean up trash when DESTROY

$PDF::OCR2::Page::NO_TRASH_CLEANUP = 0;

Debug on

$PDF::OCR2::Page::DEBUG = 0;

AUTHOR

Leo Charre leocharre at cpan dot org