NAME
Unicode::Wrap - Unicode Line Breaking
SYNOPSIS
use Unicode::Wrap;
my $wrapper = new Unicode::Wrap( line_length => 75 );
my $text = $wrapper->wrap($long_string); # Unwrapped string
my $text = $wrapper->rewrap($long_string); # Remove newlines first
ABSTRACT
This module implements UAX#14: Line Breaking Properties. It goes through a text string, classifies each character and computes a length for each. When the line gets too long, a break is inserted where appropriate.
DESCRIPTION
The following methods are available:
- new(parameters)
-
This constructs a new wrapping object. Parameters:
- line_length
-
Specifies the length of a line (in whatever units you want to use)
- emergency_break
-
If set, and there are no breaking opportunities before the line_length is reached, an 'emergency' break will be inserted at this position. Generally this should be set to line_length (or 1, since it won't be used until line_length is reached anyway).
If emergency_break is not set, no emergency breaks will be inserted, which could result in some really long lines if no line-breaking opportunity exists.
- length_lookup
-
This should contain a coderef to your own 'length' implementation. It will be passed the character in question and the classification of that character. It should return the length of the character in your chosen unit.
This may also contain a simple hashref, keyed on the character, with values consisting of the length of that character.
- wrap($text, ...)
-
This will take a chunk of text, normalize the newlines (but preserve them) and attempt to wrap it per UAX#14. More than one block of text can be wrapped, but each block is wrapped independently from the previous.
- rewrap($text, ...)
-
This does the same thing as
wrap, except that newlines are normalized to spaces before wrapping. This might be used if you already have a paragraph of text that you want to re-wrap.
BUGS
- This module is slow. It's a pure-Perl implementation that goes through an expensive classification process per character.
- Some classification rules may not be complete. These are noted with 'TODO' in the code.
- Combining Marks should "inherit" the breaking properties of the character they're being combined with, so that if a character normally allows a break after, the opportunity needs to be translated to the combining mark, so that the break can occur after the combined result.
- Tests are not very complete.
SEE ALSO
AUTHOR
David NESTING <david@fastolfe.net>
Copyright (c) 2003 David Nesting. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.