NAME

@<Biblio::Document::Parser::Utils> - utility module for handling International characters and document conversion

DESCRIPTION

Biblio::Document::Parser::Utils provides some utility functions for handling international characters and for conversion of documents to plaintext.

SYNOPSIS

use Biblio::Document::Parser::Utils qw( normalise_multichars );

print normalise_multichars( $str );

METHODS

$str = normalise_multichar( $str )

Convert multi-char international characters into single UTF-8 chars, e.g.: ¨o => ö These appear in pdftotext output from PDFs generated by pdflatex.

$content = ParaTools::Utils::get_content($location)

This function takes either a filename or a URL as a parameter, and aims to return a string containing the lines in the file. A hash of converters is provided in ParaTools/Utils.pm, which should be customised for your system.

For URLs, the file is first downloaded to a temporary directory, then converted, whereas local files are copied straight into the temporary directory. For this reason, some care should be taken when handling very large files.

$escaped_url = ParaTools::Utils::url_escape($string)

Simple function to convert a string into an encoded URL (i.e. spaces to %20, etc). Takes the unencoded URL as a parameter, and returns the encoded version.

AUTHOR

Tim Brody <tdb01r@ecs.soton.ac.uk> Mike Jewell <moj@ecs.soton.ac.uk> (packaging)

1 POD Error

The following errors were encountered while parsing the POD:

Around line 67:

Non-ASCII character seen before =encoding in '¨o'. Assuming UTF-8