NAME
Lingua::EN::Titlecase - Titlecasing of English words by traditional editorial rules.
VERSION
0.03
CAVEAT
Alpha software. I'm very interested in feedback! All interfaces, method names, and internal code subject to change or being roundfiled in the BackPan.
Apologies for the current placeholders in this doc.
SYNOPSIS
use Lingua::EN::Titlecase;
my $tc = Lingua::EN::Titlecase->new("CAN YOU FIX A TITLE?");
print $tc->title(), $/;
$tc->title("and again but differently");
print $tc->title(), $/;
$tc->title("cookbook don't work, do she?");
print "$tc\n";
DESCRIPTION
Titlecasing in standard English usage is the initial capitalization of regular words minus inner articles, prepositions, and conjunctions.
This is one of those problems that is somewhat easy to solve for the general case but impossible to solve for all cases. Hence the lack of module till now.
# allow for style/usage plugins...?
Simple techniques like--
$data =~ s/(\w+)/\u\L$1/g;
Fail on words like "can't" and don't always take into account editorial rules or cases like--
- compound words -- Perl-like
- abbreviations -- USA
- mixedcase and proper names -- eBay: nEw KEyBOArD
-
NB: cases like iPod and eBay do not currently work properly and don't yet have a hook to manually correct this. They will have both in future versions.
- all caps -- SHOUT ME DOWN
Lingua::EN::Titlecase attempts to cater to the general cases and provide hooks to address the special.
INTERFACE
- Lingua::EN::Titlecase->new()
- $tc->new
-
The string to be titlecased can be set three ways. Single argument to new. The "original" hash element to
new
. With thetitle
method.$tc->new("this is what should be titlecased"); $tc->new(original => "no, this is"); $tc->title("i beg to differ");
The last is to be able to reuse the Titlecase object.
Lingua::EN::Titlecase objects stringify to their processed titlecase, if they have a string, the ref of the object otherwise.
- $tc->original
-
Returns the original string.
- $tc->title
-
Set the original string, returns the titlecased version. Both can be done at once.
print $tc->title("did you get that thing i sent?")
- $tc->titlecase
-
Returns the titlecased string. Croaks if there is no original set via the constructor or the method
title
.
STRATEGIES
One of the hardest parts of properly titlecasing input is knowing if part of it is already correct and should not be clobbered. E.g.--
Old MacDonald had a farm
Is partly right and the proper name MacDonald should be left alone. Lowercasing the whole string and then title casing would yield--
Old Macdonald Had a Farm
So, to determine when to flatten a title to lowercase before processing, we check the ratio of mixedcase and the ratio of caps.
- $tc->mixed_threshold
-
Set/get. The ratio of mixedcase to letters which triggers lowercasing the whole string before trying to titlecase. The built-in threshold to clobber is .30.
# example
- $tc->uc_threshold
-
Same as mixed but for "all" caps. Default threshold is .90.
# example
- $tc->mixed_case
-
Scalar context returns count of mixedcase letters found. All caps and initial caps are not counted. List context returns the letters. E.g.--
my $tc = Lingua::EN::Titlecase->new(); $tc->title("tHaT pROBABly Will nevEr BE CorrectlY hanDled"); printf "%d, %s\n", scalar($tc->mixedcase), join(" ", $tc->mixedcase);
Yields--
11, H T R O B A B E C Y D
This is useful for determining if a string is overly mixed. Substrings like "pH" crop up now and then but they should never compose a high percentage of a properly cased title.
- $tc->wc
-
"Word" count. Scalar context returns count of "words." List returns them.
- $tc->lowercase
-
Count/list of lowercase letters found.
- $tc->mixedcase
-
Count/list of mixedcase letters found.
- $tc->uppercase
-
Count/list of uppercase letters found.
- $tc->whitespace
-
Count/list of whitespace -- \s+ -- found.
DIAGNOSTICS
TODO
Dictionary hook to allow BIG lists of proper names and lc to be applied.
Handle hypens; user hooks.
Smart apostrophe, utf8, entities?
Recipes. Including TT2 "plugin" recipe.
Take out Class::Accessor...? For having it all in one place, checking args, and slight speed gain.
Bigger test suite.
RECIPES
Mini-scripts to test strings or accomplish custom configuration goals.
CONFIGURATION AND ENVIRONMENT
...321
Lingua::EN::Titlecase requires no configuration files or environment variables.
DEPENDENCIES
Perl 5.6 or better to support POSIX regex classes.
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
This is alpha-software. No bugs have been reported.
Please report any bugs or feature requests to bug-lingua-en-titlecase@rt.cpan.org
, or through the web interface at http://rt.cpan.org.
AUTHOR
Ashley Pond V <ashley@cpan.org>
LICENCE AND COPYRIGHT
Copyright (c) 2007, Ashley Pond V <ashley@cpan.org>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.