The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

SelectPdf::PdfToTextClient - Pdf To Text Conversion with SelectPdf Online API. Extract text from PDF. Search PDF.

SYNOPSIS

Extract text from PDF

    use JSON;
    use SelectPdf;

    print "This is SelectPdf-$SelectPdf::VERSION.\n";

    my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
    my $test_pdf = "Input.pdf";
    my $local_file = "Test.txt";
    my $apiKey = "Your API key here";

    eval {
        my $client = new SelectPdf::PdfToTextClient($apiKey);

        print "Starting pdf to text ...\n";

        # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
        $client
            ->setStartPage(1) # start page (processing starts from here)
            ->setEndPage(0) # end page (set 0 to process file til the end)
            ->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
        ;

        # convert local pdf to local text file
        $client->getTextFromFileToFile($test_pdf, $local_file);

        # extract text from local pdf to memory
        # my $text = $client->getTextFromFile($test_pdf);
        # print $text;

        # convert pdf from public url to local text file
        # $client->getTextFromUrlToFile($test_url, $local_file);

        # extract text from pdf from public url to memory
        # my $text = $client->getTextFromUrl($test_url);
        # print $text;

        print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";

        # get API usage
        my $usageClient = new SelectPdf::UsageClient($apiKey);
        my $usage = $usageClient->getUsage(0);
        print("Usage: " . encode_json($usage) . "\n");
        print("Conversions remained this month: ". $usage->{"available"});
    };

    if ($@) {
        print "An error occurred: $@\n";  
    }

Search PDF

    use JSON;
    use SelectPdf;

    print "This is SelectPdf-$SelectPdf::VERSION.\n";

    my $test_url = "https://selectpdf.com/demo/files/selectpdf.pdf";
    my $test_pdf = "Input.pdf";
    my $apiKey = "Your API key here";

    eval {
        my $client = new SelectPdf::PdfToTextClient($apiKey);

        print "Starting search pdf ...\n";

        # set parameters - see full list at https://selectpdf.com/pdf-to-text-api/
        $client
            ->setStartPage(1) # start page (processing starts from here)
            ->setEndPage(0) # end page (set 0 to process file til the end)
            ->setOutputFormat(0) # set output format - 0 (Text), 1 (Html)
        ;

        # search local pdf
        my $results = $client->searchFile($test_pdf, "pdf", "True", "True");

        # search pdf from public url
        # my $results = $client->searchUrl($test_url, "pdf", "True", "True");

        my $count = keys @{$results};
        print("Number of search results: " . $count . "\n");
        print("Results: " . encode_json($results) . "\n");

        print "Finished! Number of pages processed: " . $client->getNumberOfPages() . ".\n";

        # get API usage
        my $usageClient = new SelectPdf::UsageClient($apiKey);
        my $usage = $usageClient->getUsage(0);
        print("Usage: " . encode_json($usage) . "\n");
        print("Conversions remained this month: ". $usage->{"available"});
    };

    if ($@) {
        print "An error occurred: $@\n";  
    }

For more details and full list of parameters see Pdf To Text API.

METHODS

new( $apiKey )

Construct the Pdf To Text Client.

    my $client = SelectPdf::PdfToTextClient->new($apiKey);

Parameters:

- $apiKey: API Key.

getTextFromFile( $inputPdf )

Get the text from the specified pdf.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $text = $client->getTextFromFile($inputPdf);

Parameters:

- $inputPdf: Path to a local PDF file.

Returns:

- Extracted text.

getTextFromFileToFile( $inputPdf, $outputFilePath )

Get the text from the specified pdf and write it to the specified text file.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $client->getTextFromFileToFile($inputPdf, $outputFilePath);

Parameters:

- $inputPdf: Path to a local PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromFileAsync( $inputPdf )

Get the text from the specified pdf with an asynchronous call.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $text = $client->getTextFromFileAsync($inputPdf);

Parameters:

- $inputPdf: Path to a local PDF file.

Returns:

- Extracted text.

getTextFromFileToFileAsync( $inputPdf, $outputFilePath )

Get the text from the specified pdf with an asynchronous call and write it to the specified text file.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $client->getTextFromFileToFileAsync($inputPdf, $outputFilePath);

Parameters:

- $inputPdf: Path to a local PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromUrl( $url )

Get the text from the specified pdf.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $text = $client->getTextFromUrl($url);

Parameters:

- $url: Address of the PDF file.

Returns:

- Extracted text.

getTextFromUrlToFile( $url, $outputFilePath )

Get the text from the specified pdf and write it to the specified text file.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $client->getTextFromUrlToFile($url, $outputFilePath);

Parameters:

- $url: Address of the PDF file.

- $outputFilePath: The output file where the resulted text will be written.

getTextFromUrlAsync( $url )

Get the text from the specified pdf with an asynchronous call.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $text = $client->getTextFromUrlAsync($url);

Parameters:

- $url: Address of the PDF file.

Returns:

- Extracted text.

getTextFromUrlToFileAsync( $url, $outputFilePath )

Get the text from the specified pdf with an asynchronous call and write it to the specified text file.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $client->getTextFromUrlToFileAsync($url, $outputFilePath);

Parameters:

- $url: Address of the PDF file.

- $outputFilePath: The output file where the resulted text will be written.

searchFile( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $results = $client->searchFile($inputPdf, $textToSearch);

Parameters:

- $inputPdf: Path to a local PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchFileAsync( $inputPdf, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $results = $client->searchFileAsync($inputPdf, $textToSearch);

Parameters:

- $inputPdf: Path to a local PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchUrl( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $results = $client->searchUrl($url, $textToSearch);

Parameters:

- $url: Address of the PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

searchUrlAsync( $url, $textToSearch, $caseSensitive, $wholeWordsOnly )

Search for a specific text in a PDF document with an asynchronous call. Pages that participate to this operation are specified by setStartPage() and setEndPage() methods.

    my $client = new SelectPdf::PdfToTextClient($apiKey);
    $results = $client->searchUrlAsync($url, $textToSearch);

Parameters:

- $url: Address of the PDF file.

- $textToSearch: Text to search.

- $caseSensitive: If the search is case sensitive or not.

- $wholeWordsOnly: If the search works on whole words or not.

Returns:

- List with text positions in the current PDF document.

setCustomParameter( $parameterName, $parameterValue )

Set a custom parameter. Do not use this method unless advised by SelectPdf.

Parameters:

- $parameterName: Parameter name.

- $parameterValue: Parameter value.

Returns:

- Reference to the current object.

setTimeout( $timeout )

Set the maximum amount of time (in seconds) for this job. The default value is 30 seconds. Use a larger value (up to 120 seconds allowed) for pages that take a long time to load.

Parameters:

- $timeout: Timeout in seconds.

Returns:

- Reference to the current object.

setStartPage( $startPage )

Set Start Page number. Default value is 1 (first page of the document).

Parameters:

- $startPage: Start page number (1-based).

Returns:

- Reference to the current object.

setEndPage( $endPage )

Set End Page number. Default value is 0 (process till the last page of the document).

Parameters:

- $endPage: End page number (1-based).

Returns:

- Reference to the current object.

setUserPassword( $userPassword )

Set PDF user password.

Parameters:

- $userPassword: PDF user password.

Returns:

- Reference to the current object.

setTextLayout( $textLayout )

Set the text layout. The default value is 0 (Original).

Parameters:

- $textLayout: The text layout. Possible values: 0 (Original), 1 (Reading).

Returns:

- Reference to the current object.

setOutputFormat( $outputFormat )

Set the output format. The default value is 0 (Text).

Parameters:

- $outputFormat: The output format. Possible values: 0 (Text), 1 (Html).

Returns:

- Reference to the current object.