NAME

CAM::PDF::PageText - Extract text from PDF page tree

SYNOPSIS

my $pdf = CAM::PDF->new($filename);
my $pageone_tree = $pdf->getPageContentTree(1);
print CAM::PDF::PageText->render($pageone_tree);

DESCRIPTION

This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.

All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.

LICENSE

Same as CAM::PDF

FUNCTIONS

$pkg->render($pagetree)

$pkg->render($pagetree, $verbose)

Turn a page content tree into a string. This is a class method that should be called like:

CAM::PDF::PageText->render($pagetree);

AUTHOR

See CAM::PDF

To install CAM::PDF, copy and paste the appropriate command in to your terminal.

cpanm

cpanm CAM::PDF

CPAN shell

perl -MCPAN -e shell
install CAM::PDF

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)