NAME
HTML::Embellish - Typographically enhance HTML trees
VERSION
This document describes version 0.08 of HTML::Embellish, released August 18, 2012.
SYNOPSIS
use HTML::Embellish;
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder->new_from_file(...);
embellish($html);
DESCRIPTION
HTML::Embellish adds typographical enhancements to HTML text. It converts certain ASCII characters to Unicode characters. It converts quotation marks and apostrophes into curly quotes. It converts hyphens into em-dashes. It inserts non-breaking spaces between the periods of an ellipsis. (It doesn't use the HORIZONTAL ELLIPSIS character (U+2026), because I like more space in my ellipses.)
INTERFACE
embellish($html, ...)
-
This subroutine (exported by default) is the main entry point. It's a shortcut for
HTML::Embellish->new(...)->process($html)
.If you're going to process several trees with the same parameters, the object-oriented interface will be slightly more efficient.
$emb = HTML::Embellish->new(flag => value, ...)
-
This creates an HTML::Embellish object that will perform the specified enhancements. These are the (optional) flags that you can pass:
dashes
-
If true, converts sequences of hyphens into em-dashes. Two or 3 hyphens become one em-dash. Four hyphens become two em-dashes. Any other sequence of hyphens is not changed.
ellipses
-
If true, inserts non-breaking spaces between the periods making up an ellipsis. Also converts the space before an ellipsis that appears to end a sentence to a non-breaking space.
hellip
-
If true, converts the … character to 3 periods. (To insert non-breaking spaces between them, also set
ellipses
to true.) This defaults to the value ofellipses
. space_ellipses
-
If true, adds whitespace around ellipses when necessary. This defaults to the value of
ellipses
. quotes
-
If true, converts quotation marks and apostrophes into curly quotes.
default
-
This is the default value used for flags that you didn't specify. It defaults to 1 (enabled). The main reason for using this flag is to disable any enhancements that might be introduced in future versions of HTML::Embellish.
$emb->process($html)
-
The
process
method enhances the content of the HTML::Element you pass in. You can pass the root element to process the entire tree, or any sub-element to process just that part of the tree. The tree is modified in-place; the return value is not meaningful.
DIAGNOSTICS
First parameter of embellish must be an HTML::Element
-
You didn't pass a valid HTML::Element object to embellish.
HTML::Embellish->process must be passed an HTML::Element
-
You didn't pass a valid HTML::Element object to embellish.
Odd number of parameters passed to HTML::Embellish->new
-
HTML::Embellish->new
takes parameters inKEY => VALUE
style, so there must always be an even number of them.
CONFIGURATION AND ENVIRONMENT
HTML::Embellish requires no configuration files or environment variables.
DEPENDENCIES
Requires the HTML::Tree distribution from CPAN (or some other module that implements the HTML::Element interface). Versions of HTML::Tree prior to 3.21 had some bugs involving Unicode characters and non-breaking spaces.
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
I've experienced occasional segfaults when using this module with Perl 5.8.8. Since a pure-Perl module like this shouldn't be able to cause a segfault, I believe the issue is with Perl 5.8. I recommend using Perl 5.10 if at all possible, as the files that segfaulted under 5.8.8 worked fine with 5.10.
AUTHOR
Christopher J. Madsen <perl AT cjmweb.net>
Please report any bugs or feature requests to <bug-HTML-Embellish AT rt.cpan.org>
or through the web interface at http://rt.cpan.org/Public/Bug/Report.html?Queue=HTML-Embellish.
You can follow or contribute to HTML-Embellish's development at http://github.com/madsen/html-embellish.
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Christopher J. Madsen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.