NAME
PDF::OCR2::Page
DESCRIPTION
Extract a pdf page document's text, from inside the document and if there are images, from the images via tesseract ocr.
METHODS
new()
Arg is hashref. Must have abs_pdf to pdf file. If no abs_pdf is provided or it does not exist on disk, throws exception.
abs_pdf()
Argument is path to pdf representing one page. Must be on disk. Perl setget method.
abs_images()
Returns aref of images, returns list in list context. Uses PDF::GetImages, slow.
CLASS VARIABLES
Defaults shown.
Eval pdf with PDF::API2 for correctness/etc.
$PDF::OCR2::Page::CHECK_PDF = 0;
Do not clean up trash when DESTROY
$PDF::OCR2::Page::NO_TRASH_CLEANUP = 0;
Debug on
$PDF::OCR2::Page::DEBUG = 0;
AUTHOR
Leo Charre leocharre at cpan dot org