NAME
Text::Amuse::Preprocessor - Helpers for Text::Amuse document formatting.
VERSION
Version 0.32
SYNOPSIS
use Text::Amuse::Preprocessor;
my $pp = Text::Amuse::Preprocessor->new(
input => $infile,
output => $outfile,
html => 1,
fix_links => 1,
fix_typography => 1,
fix_nbsp => 1,
fix_footnotes => 1
);
$pp->process;
DESCRIPTION
This module provides a solution to apply some common fixes to muse files.
Without any option save for input
and output
(which are mandatory), the only things the module does is to remove carriage returns, replace character ligatures or characters which shouldn't enter at all and expand the tabs to 4 spaces (no smart expanding).
LANGUAGE SUPPORT
The following languages are supported
- english
-
smart quotes, dashes, and the common superscripts (like 11th)
- russian
-
smart quotes, dashes and non-breaking spaces
- spanish
-
smart quotes and dashes
- finnish
-
smart quotes and dashes
- swedish
-
smart quotes and dashes
- serbian
-
smart quotes and dashes
- croatian
-
smart quotes and dashes
- italian
-
smart quotes and dashes
- macedonian
-
smart quotes and dashes
- german
-
smart quotes and dashes
ACCESSORS
The following values are read-only and must be passed to the constructor.
Mandatory
input
Can be a string (with the input file path) or a reference to a scalar with the text to process).
output
Can be a string (with the output file path) or a reference to a scalar with the processed text.
Optional
html
Before doing anything, convert the HTML input into a muse file. Even if possible, you're discouraged to do the html import and the fixing in the same processing. Instead, create two objects, then first do the HTML to muse convert, save the result somewhere, add the headers, then reprocess it with the required fixes above.
Notably, the output will be without an header, so the language will not be detected.
Default to false.
fix_links
Find the links and add the markup if needed. Default to false.
fix_typography
Apply the typographical fixes. Default to false. This add the "smart quotes" feature.
remove_nbsp
Remove all the non-break spaces in the document, unconditionally. This options does not conflict with the following. If both are provided, first the non-break spaces are removed, then reinserted.
fix_nbsp
Add non-break spaces where appropriate (whatever this means).
fix_footnotes
Rearrange the footnotes if needed. Default to false.
debug
Don't unlink the temporary files and be verbose
METHODS
new(%options)
Constructor. Accepts the above options.
process
Process input
according to the options passed and write into output
. Return output
on success, false otherwise.
html_to_muse
Can be called on the class and will invoke the Text::Amuse::Preprocessor::HTML's html_to_muse
function on the argument returning the converted chunk.
error
This is set only when processing footnotes. See Text::Amuse::Preprocessor::Footnotes documentation for the hashref returned when an error has been detected.
tmpdir
Return the directory name used internally to hold the temporary files.
AUTHOR
Marco Pessotto, <melmothx at gmail.com>
BUGS
Please report any bugs or feature requests to the author's email. If you find a bug, please provide a minimal muse file which reproduces the problem (so I can add it to the test suite).
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Text::Amuse::Preprocessor
Repository available at Gitorious: https://gitorious.org/text-amuse-preprocessor
SEE ALSO
The original documentation for the Emacs Muse markup can be found at: http://mwolson.org/static/doc/muse/Markup-Rules.html
The parser itself is Text::Amuse.
This distribution ships the following executables
html-to-muse.pl (HTML to muse converter)
muse-check-footnotes.pl (footnote checker)
muse-rearrange-footnotes.pl (fix footnote numbering)
pod-to-muse.pl (POD to muse converter)
muse-preprocessor.pl (script which uses this module)
See the manpage or pass --help to the scripts for usage.
LICENSE
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.