NAME

SelectPdf::PdfToTextClient - Pdf To Text Conversion with SelectPdf Online API. Extract text from PDF. Search PDF.

SYNOPSIS

Extract text from PDF

use JSON;
use SelectPdf;

print "This is SelectPdf-$SelectPdf::VERSION.\n";

my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
my $test_pdf = "Input.pdf";
my $local_file = "Test.txt";
my $apiKey = "Your API key here";

eval {
    my $client = new SelectPdf::PdfToTextClient($apiKey);

    print "Starting pdf to text ...\n";

    # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
    $client
        ->setStartPage(1) # start page (processing starts from here)
        ->setEndPage(0) # end page (set 0 to process file til the end)
        ->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
    ;

    # convert local pdf to local text file
    $client->getTextFromFileToFile($test_pdf, $local_file);

    # extract text from local pdf to memory
    # my $text = $client->getTextFromFile($test_pdf);
    # print $text;

    # convert pdf from public url to local text file
    # $client->getTextFromUrlToFile($test_url, $local_file);

    # extract text from pdf from public url to memory
    # my $text = $client->getTextFromUrl($test_url);
    # print $text;

    print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";

    # get API usage
    my $usageClient = new SelectPdf::UsageClient($apiKey);
    my $usage = $usageClient->getUsage(0);
    print("Usage: " . encode_json($usage) . "\n");
    print("Conversions remained this month: ". $usage->{"available"});
};

if ($@) {
    print "An error occurred: $@\n";  
}

Search PDF

use JSON;
use SelectPdf;

print "This is SelectPdf-$SelectPdf::VERSION.\n";

my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
my $test_pdf = "Input.pdf";
my $apiKey = "Your API key here";

eval {
    my $client = new SelectPdf::PdfToTextClient($apiKey);

    print "Starting search pdf ...\n";

    # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
    $client
        ->setStartPage(1) # start page (processing starts from here)
        ->setEndPage(0) # end page (set 0 to process file til the end)
        ->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
    ;

    # search local pdf
    my $results = $client->searchFile($test_pdf, "pdf", "True", "True");

    # search pdf from public url
    # my $results = $client->searchUrl($test_url, "pdf", "True", "True");

    my $count = keys @{$results};
    print("Number of search results: " . $count . "\n");
    print("Results: " . encode_json($results) . "\n");

    print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";

    # get API usage
    my $usageClient = new SelectPdf::UsageClient($apiKey);
    my $usage = $usageClient->getUsage(0);
    print("Usage: " . encode_json($usage) . "\n");
    print("Conversions remained this month: ". $usage->{"available"});
};

if ($@) {
    print "An error occurred: $@\n";  
}

For more details and full list of parameters see Pdf To Text API.

METHODS

new( $apiKey )

Construct the Pdf To Text Client.

my $client = SelectPdf::PdfToTextClient->new($apiKey);

Parameters:

- $apiKey: API Key.

getTextFromFile( $inputPdf )

Get the text from the specified pdf.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromFile($inputPdf);

Parameters:

- $inputPdf: Path to a local PDF file.

Returns:

- Extracted text.

getTextFromFileToFile( $inputPdf, $outputFilePath )

Get the text from the specified pdf and write it to the specified text file.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromFileToFile($inputPdf, $outputFilePath);

Parameters:

- $inputPdf: Path to a local PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromFileAsync( $inputPdf )

Get the text from the specified pdf with an asynchronous call.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromFileAsync($inputPdf);

Parameters:

- $inputPdf: Path to a local PDF file.

Returns:

- Extracted text.

getTextFromFileToFileAsync( $inputPdf, $outputFilePath )

Get the text from the specified pdf with an asynchronous call and write it to the specified text file.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromFileToFileAsync($inputPdf, $outputFilePath);

Parameters:

- $inputPdf: Path to a local PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromUrl( $url )

Get the text from the specified pdf.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromUrl($url);

Parameters:

- $url: Address of the PDF file.

Returns:

- Extracted text.

getTextFromUrlToFile( $url, $outputFilePath )

Get the text from the specified pdf and write it to the specified text file.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromUrlToFile($url, $outputFilePath);

Parameters:

- $url: Address of the PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromUrlAsync( $url )

Get the text from the specified pdf with an asynchronous call.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$text = $client->getTextFromUrlAsync($url);

Parameters:

- $url: Address of the PDF file.

Returns:

- Extracted text.

getTextFromUrlToFileAsync( $url, $outputFilePath )

Get the text from the specified pdf with an asynchronous call and write it to the specified text file.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$client->getTextFromUrlToFileAsync($url, $outputFilePath);

Parameters:

- $url: Address of the PDF file.

- $outputFilePath: The output file where the resulted text will be written.

searchFile( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchFile($inputPdf, $textToSearch);

Parameters:

- $inputPdf: Path to a local PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchFileAsync( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchFileAsync($inputPdf, $textToSearch);

Parameters:

- $inputPdf: Path to a local PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchUrl( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchUrl($url, $textToSearch);

Parameters:

- $url: Address of the PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchUrlAsync( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

my $client = new SelectPdf::PdfToTextClient($apiKey);
$results = $client->searchUrlAsync($url, $textToSearch);

Parameters:

- $url: Address of the PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

setCustomParameter( $parameterName, $parameterValue )

Set a custom parameter. Do not use this method unless advised by SelectPdf.

Parameters:

- $parameterName: Parameter name.

- $parameterValue: Parameter value.

Returns:

- Reference to the current object.

setTimeout( $timeout )

Set the maximum amount of time (in seconds) for this job. The default value is 30 seconds. Use a larger value (up to 120 seconds allowed) for pages that take a long time to load.

Parameters:

- $timeout: Timeout in seconds.

Returns:

- Reference to the current object.

setStartPage( $startPage )

Set Start Page number. Default value is 1 (first page of the document).

Parameters:

- $startPage: Start page number (1-based).

Returns:

- Reference to the current object.

setEndPage( $endPage )

Set End Page number. Default value is 0 (process till the last page of the document).

Parameters:

- $endPage: End page number (1-based).

Returns:

- Reference to the current object.

setUserPassword( $userPassword )

Set PDF user password.

Parameters:

- $userPassword: PDF user password.

Returns:

- Reference to the current object.

setTextLayout( $textLayout )

Set the text layout. The default value is 0 (Original).

Parameters:

- $textLayout: The text layout. Possible values: 0 (Original), 1 (Reading).

Returns:

- Reference to the current object.

setOutputFormat( $outputFormat )

Set the output format. The default value is 0 (Text).

Parameters:

- $outputFormat: The output format. Possible values: 0 (Text), 1 (Html).

Returns:

- Reference to the current object.