The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

PDF::Extract - Extracting sub PDF documents from a multipage PDF document

SYNOPSIS

 use PDF::Extract;
 $pdf=new PDF::Extract;
 $pdf->servePDFExtract( PDFDoc=>"c:/Docs/my.pdf", PDFPages=>"1-3 31-36" );

or

 use PDF::Extract;
 $pdf = new PDF::Extract( PDFDoc=>'C:/my.pdf' );
 $pdf->getPDFExtract( PDFPages=>@PDFPages );
 print "Content-Type text/plain\n\n<pre>",  $pdf->getPDFExtractOut;
 print $pdf->getPDFExtractError;

DESCRIPTION

PDF Extract is a group of methods that allow the user to quickly grab pages as a new PDF document from a pre-existing PDF document.

 With PDF::Extract a new PDF document can be:-
  • assigned to a scalar variable with getPDFExtract.

  • saved to disk with savePDFExtract.

  • printed to STDOUT as a PDF web document with servePDFExtract.

  • cached and served for a faster PDF web document service with fastServePDFExtract.

These four main methods can be called with or without arguments. The methods will not work unless they know the location of the original PDF document and the pages to extract. There are no default values.

METHODS

new PDF::Extract

Creates a new Extract object with empty state information ready for processing data both input and output. New can be called with a hash array argument.

 new PDF::Extract( PDFDoc=>"c:/Docs/my.pdf", PDFPages=>"1-3 31-36" )

This will cause a new PDF document to be generated unless there is an error. Extract->new() simply calls getPDFExtract() if there is an argument.

getPDFExtract

This method is the main workhorse of the package. It does all the PDF processing and sets PDFExtractError if its unable to create a new PDF document. It requires PDFDoc and PDFPages to be set either in this call of before to function. It outputs a PDF document as a string or an empty string if there is an error.

To create an array of PDF documents, each consisting of a single page, from a multi page PDF document.

  $pdf = new PDF::Pages( PDFDoc=>'C:/my.pdf' );
  while ( $pdf[$i]=$pdf->getPDFExtract( PDFPages=>++$i ) );

The lowest valid page number for PDFPages is 1. A value of 0 will produce no output and raise an error. An error will be raised if the PDFPages value does not correspond to any pages.

savePDFExtract

This method saves its output to what ever PDFExtractCache is set to. The new file name will be an amalgam of the original filename, the page numbers of the extracted pages separated with an underscore "_" and the .pdf file type suffix.

  $pdf = new PDF::Pages;
  $pdf->savePDFExtract(PDFPage=>"1 3-5", PDFDoc=>'C:/my.pdf', PDFExtractCache=>"C:/myCache" );

The saved PDF location and file name will be "C:/myCache/my_1_3_4_5.pdf".

servePDFExtract

This method serves its output to STDOUT with the correct header for a PDF document served on the web. The served file's name will be an amalgam of the original filename, the page numbers of the extracted pages separated with an underscore "_" and the .pdf file type suffix. If there is an error then an error page will be served.

  $pdf = PDF::Pages->new;
  $pdf->servePDFExtract( PDFDoc=>'C:/my.pdf', PDFPage=>1);

The file name of the served file will be "my_1.pdf".

fastServePDFExtract

This method serves its output to STDOUT with the correct header for a PDF document served on the web. The served file's name will be an amalgam of the original filename, the page numbers of the extracted pages separated with an underscore "_" and the .pdf file type suffix. This method also checks to see if the PDF document requested is in the cache folder, as set with PDFExtractCache. If it exists then this file is served instead of processing a new PDF document. If there is an error then an error page will be served.

  $pdf = new PDF::Pages(PDFExtractCache=>"C:/myCache" );
  $pdf->fastServePDFExtract( PDFDoc=>'C:/my.pdf', PDFPage=>1);

The file name of the served file will be "my_1.pdf".

getPDFExtractError

 $pdf->getPDFExtractError;

This method returns an error message if there is one. An error is set if the output from any other method is an empty string.

The error message is comprised of a short description, a file and the line number of where the error was detected.

getPDFExtractDoc

 $pdf->getPDFExtractDoc;

This method returns the last original PDF document accessed by getPDFExtract, savePDFExtract, servePDFExtract and fastServePDFExtract. getPDFExtractDoc will return an empty string if there was an error.

getPDFExtractOut

 $pdf->getPDFExtractOut;

This method returns the last PDF document processed by getPDFExtract, savePDFExtract, servePDFExtract and fastServePDFExtract. getPDFExtractOut will return an empty string if there was an error.

getPDFExtractCachePath

 $pdf->getPDFExtractCachePath;

This method returns the path to the PDF document cache. This value is required by savePDFExtract and fastServePDFExtract method calls. getPDFExtractCachePath will return an empty string if there was an error in setting the value.

setPDFExtractCachePath

 $pdf->setPDFExtractCachePath("C:\myCache");

This method returns the path to the PDF document cache. This value is required by savePDFExtract and fastServePDFExtract method calls. setPDFExtractCachePath will return an empty string if there was an error in setting the value.

getPDFExtractFound

 $pdf->getPDFExtractFound;

This method returns a string representing the pages that were selected and found within the original PDF document. getPDFExtractFound will return an empty string if there was an error in setting the value.

AUTHOR

Noel Sharrock <mailto:nsharrok@lgmedia.com.au>

PDF::Extract's home page http://www.lgmedia.com.au/PDF/Extract.asp

COPYRIGHT

Copyright (c) 2003 by Noel Sharrock. All rights reserved.

LICENSE

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the ``Artistic License'' or the ``GNU General Public License''.

The C library at the core of this Perl module can additionally be redistributed and/or modified under the terms of the ``GNU Library General Public License''.

DISCLAIMER

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the ``GNU General Public License'' for more details.

PDF::Extract - Extracting sub PDF documents from a multipage PDF document