NAME
SelectPdf::PdfToTextClient - Pdf To Text Conversion with SelectPdf Online API. Extract text from PDF. Search PDF.
SYNOPSIS
Extract text from PDF
use JSON;
use SelectPdf;
print "This is SelectPdf-$SelectPdf::VERSION.\n";
my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
my $test_pdf = "Input.pdf";
my $local_file = "Test.txt";
my $apiKey = "Your API key here";
eval {
my $client = new SelectPdf::PdfToTextClient($apiKey);
print "Starting pdf to text ...\n";
# set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
$client
->setStartPage(1) # start page (processing starts from here)
->setEndPage(0) # end page (set 0 to process file til the end)
->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
;
# convert local pdf to local text file
$client->getTextFromFileToFile($test_pdf, $local_file);
# extract text from local pdf to memory
# my $text = $client->getTextFromFile($test_pdf);
# print $text;
# convert pdf from public url to local text file
# $client->getTextFromUrlToFile($test_url, $local_file);
# extract text from pdf from public url to memory
# my $text = $client->getTextFromUrl($test_url);
# print $text;
print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";
# get API usage
my $usageClient = new SelectPdf::UsageClient($apiKey);
my $usage = $usageClient->getUsage(0);
print("Usage: " . encode_json($usage) . "\n");
print("Conversions remained this month: ". $usage->{"available"});
};
if ($@) {
print "An error occurred: $@\n";
}
Search PDF
use JSON;
use SelectPdf;
print "This is SelectPdf-$SelectPdf::VERSION.\n";
my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
my $test_pdf = "Input.pdf";
my $apiKey = "Your API key here";
eval {
my $client = new SelectPdf::PdfToTextClient($apiKey);
print "Starting search pdf ...\n";
# set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
$client
->setStartPage(1) # start page (processing starts from here)
->setEndPage(0) # end page (set 0 to process file til the end)
->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
;
# search local pdf
my $results = $client->searchFile($test_pdf, "pdf", "True", "True");
# search pdf from public url
# my $results = $client->searchUrl($test_url, "pdf", "True", "True");
my $count = keys @{$results};
print("Number of search results: " . $count . "\n");
print("Results: " . encode_json($results) . "\n");
print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";
# get API usage
my $usageClient = new SelectPdf::UsageClient($apiKey);
my $usage = $usageClient->getUsage(0);
print("Usage: " . encode_json($usage) . "\n");
print("Conversions remained this month: ". $usage->{"available"});
};
if ($@) {
print "An error occurred: $@\n";
}
For more details and full list of parameters see Pdf To Text API.
METHODS
new( $apiKey )
Construct the Pdf To Text Client.
my $client = SelectPdf::PdfToTextClient->new($apiKey);
Parameters:
- $apiKey: API Key.
getTextFromFile( $inputPdf )
Get the text from the specified pdf.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromFile($inputPdf);
Parameters:
- $inputPdf: Path to a local PDF file.
Returns:
- Extracted text.
getTextFromFileToFile( $inputPdf, $outputFilePath )
Get the text from the specified pdf and write it to the specified text file.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromFileToFile($inputPdf, $outputFilePath);
Parameters:
- $inputPdf: Path to a local PDF file.
- $outputFilePath: The output file where the resulted text will be written.
getTextFromFileAsync( $inputPdf )
Get the text from the specified pdf with an asynchronous call.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromFileAsync($inputPdf);
Parameters:
- $inputPdf: Path to a local PDF file.
Returns:
- Extracted text.
getTextFromFileToFileAsync( $inputPdf, $outputFilePath )
Get the text from the specified pdf with an asynchronous call and write it to the specified text file.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromFileToFileAsync($inputPdf, $outputFilePath);
Parameters:
- $inputPdf: Path to a local PDF file.
- $outputFilePath: The output file where the resulted text will be written.
getTextFromUrl( $url )
Get the text from the specified pdf.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromUrl($url);
Parameters:
- $url: Address of the PDF file.
Returns:
- Extracted text.
getTextFromUrlToFile( $url, $outputFilePath )
Get the text from the specified pdf and write it to the specified text file.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromUrlToFile($url, $outputFilePath);
Parameters:
- $url: Address of the PDF file.
- $outputFilePath: The output file where the resulted text will be written.
getTextFromUrlAsync( $url )
Get the text from the specified pdf with an asynchronous call.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromUrlAsync($url);
Parameters:
- $url: Address of the PDF file.
Returns:
- Extracted text.
getTextFromUrlToFileAsync( $url, $outputFilePath )
Get the text from the specified pdf with an asynchronous call and write it to the specified text file.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromUrlToFileAsync($url, $outputFilePath);
Parameters:
- $url: Address of the PDF file.
- $outputFilePath: The output file where the resulted text will be written.
searchFile( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )
Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchFile($inputPdf, $textToSearch);
Parameters:
- $inputPdf: Path to a local PDF file.
- $textToSearch: Text to search.
- $caseSensitive: If the search is case sensitive or not.
- $wholeWordsOnly: If the search works on whole words or not.
Returns:
- List with text positions in the current PDF document.
searchFileAsync( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )
Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchFileAsync($inputPdf, $textToSearch);
Parameters:
- $inputPdf: Path to a local PDF file.
- $textToSearch: Text to search.
- $caseSensitive: If the search is case sensitive or not.
- $wholeWordsOnly: If the search works on whole words or not.
Returns:
- List with text positions in the current PDF document.
searchUrl( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )
Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchUrl($url, $textToSearch);
Parameters:
- $url: Address of the PDF file.
- $textToSearch: Text to search.
- $caseSensitive: If the search is case sensitive or not.
- $wholeWordsOnly: If the search works on whole words or not.
Returns:
- List with text positions in the current PDF document.
searchUrlAsync( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )
Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.
my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchUrlAsync($url, $textToSearch);
Parameters:
- $url: Address of the PDF file.
- $textToSearch: Text to search.
- $caseSensitive: If the search is case sensitive or not.
- $wholeWordsOnly: If the search works on whole words or not.
Returns:
- List with text positions in the current PDF document.
setCustomParameter( $parameterName, $parameterValue )
Set a custom parameter. Do not use this method unless advised by SelectPdf.
Parameters:
- $parameterName: Parameter name.
- $parameterValue: Parameter value.
Returns:
- Reference to the current object.
setTimeout( $timeout )
Set the maximum amount of time (in seconds) for this job. The default value is 30 seconds. Use a larger value (up to 120 seconds allowed) for pages that take a long time to load.
Parameters:
- $timeout: Timeout in seconds.
Returns:
- Reference to the current object.
setStartPage( $startPage )
Set Start Page number. Default value is 1 (first page of the document).
Parameters:
- $startPage: Start page number (1-based).
Returns:
- Reference to the current object.
setEndPage( $endPage )
Set End Page number. Default value is 0 (process till the last page of the document).
Parameters:
- $endPage: End page number (1-based).
Returns:
- Reference to the current object.
setUserPassword( $userPassword )
Set PDF user password.
Parameters:
- $userPassword: PDF user password.
Returns:
- Reference to the current object.
setTextLayout( $textLayout )
Set the text layout. The default value is 0 (Original).
Parameters:
- $textLayout: The text layout. Possible values: 0 (Original), 1 (Reading).
Returns:
- Reference to the current object.
setOutputFormat( $outputFormat )
Set the output format. The default value is 0 (Text).
Parameters:
- $outputFormat: The output format. Possible values: 0 (Text), 1 (Html).
Returns:
- Reference to the current object.