NAME
Text::KnuthPlass - Breaks paragraphs into lines using the TeX algorithm
SYNOPSIS
use Text::KnuthPlass;
my $typesetter = Text::KnuthPlass->new();
my @lines = $typesetter->typeset($paragraph);
...
To use with plain text:
for (@lines) {
for (@{$_->{nodes}}) {
if ($_->isa("Text::KnuthPlass::Box")) { print $_->value }
elsif ($_->isa("Text::KnuthPlass::Glue")) { print " " }
}
if ($_->{nodes}[-1]->is_penalty) { print "-" }
print "\n";
}
To use with PDF::Builder: (as well as PDF::API2)
my $text = $page->text;
$text->font($font, 12);
$text->lead(13.5);
my $t = Text::KnuthPlass->new(
measure => sub { $text->advancewidth(shift) },
linelengths => [235]
);
my @lines = $t->typeset($paragraph);
my $y = 500;
for my $line (@lines) {
$x = 50;
for my $node (@{$line->{nodes}}) {
$text->translate($x,$y);
if ($node->isa("Text::KnuthPlass::Box")) {
$text->text($node->value);
$x += $node->width;
} elsif ($node->isa("Text::KnuthPlass::Glue")) {
$x += $node->width + $line->{ratio} *
($line->{ratio} < 0 ? $node->shrink : $node->stretch);
}
}
if ($line->{nodes}[-1]->is_penalty) { $text->text("-") }
$y -= $text->lead();
}
METHODS
new
The constructor takes a number of options. The most important ones are:
- measure
-
A subroutine reference to determine the width of a piece of text. This defaults to
length(shift)
, which is what you want if you're typesetting plain monospaced text. You will need to change this to plug into your font metrics if you're doing something graphical. - linelengths
-
This is an array of line lengths. For instance,
[30,40,50]
will typeset a triangle-shaped piece of text with three lines. What if the text spills over to more than three lines? In that case, the final value in the array is used for all further lines. So to typeset an ordinary block-shaped column of text, you only need specify an array with one value: the default is[78]
. - tolerance
-
How much leeway we have in leaving wider spaces than the algorithm would prefer.
- hyphenator
-
An object which hyphenates words. If you have
Text::Hyphen
installed (highly recommended) then aText::Hyphen
object is instantiated by default; if not, an object of the classText::KnuthPlass::DummyHyphenator
is instantiated - this simply finds no hyphenation points at all. So to turn hyphenation off, sethyphenator => Text::KnuthPlass::DummyHyphenator->new()
To typeset non-English text, pass in an object which responds to the
hyphenate
method, returning a list of hyphen positions. (SeeText::Hyphen
for the interface.)
There are other options for fine-tuning the output. If you know your way around TeX, dig into the source to find out what they are.
typeset
This is the main interface to the algorithm, made up of the constituent parts below. It takes a paragraph of text and returns a list of lines if suitable breakpoints could be found.
The list has the following structure:
(
{ nodes => \@nodes, ratio => $ratio },
{ nodes => \@nodes, ratio => $ratio },
...
)
The node list in each element will be a list of objects. Each object will be either Text::KnuthPlass::Box
, Text::KnuthPlass::Glue
or Text::KnuthPlass::Penalty
. See below for more on these.
The ratio
is the amount of stretch or shrink which should be applied to each glue element in this line. The corrected width of each glue node should be:
$node->width + $line->{ratio} *
($line->{ratio} < 0 ? $node->shrink : $node->stretch);
Each box, glue or penalty node has a width
attribute. Boxes have value
s, which are the text which went into them; glue has stretch
and shrink
to determine how much it should vary in width. That should be all you need for basic typesetting; for more, see the source, and see the original Knuth-Plass paper in "Digital Typography".
Why typeset rather than something like linesplit? Per "ACKNOWLEDGEMENTS", this code is ported from the Javascript product typeset.
This method is a thin wrapper around the three methods below.
break_text_into_nodes
This turns a paragraph into a list of box/glue/penalty nodes. It's fairly basic, and designed to be overloaded. It should also support multiple justification styles (centering, ragged right, etc.) but this will come in a future release; right now, it just does full justification.
If you are doing clever typography or using non-Western languages you may find that you will want to break text into nodes yourself, and pass the list of nodes to the methods below, instead of using this method.
break
This implements the main body of the algorithm; it turns a list of nodes (produced from the above method) into a list of breakpoint objects.
breakpoints_to_lines
And this takes the breakpoints and the nodes, and assembles them into lines.
glueclass
penaltyclass
For subclassers.
AUTHOR
originally written by Simon Cozens, <simon at cpan.org>
since 2020, maintained by Phil Perry
ACKNOWLEDGEMENTS
This module is a Perl translation of Bram Stein's "Typeset" Javascript Knuth-Plass implementation. Any bugs, however, are probably my fault.
BUGS
Please report any bugs or feature requests to the issues section of https://github.com/PhilterPaper/Text-KnuthPlass
.
Do NOT under ANY circumstances open a PR (Pull Request) to report a bug. It is a waste of both your and our time and effort. Open a regular ticket (issue), and attach a Perl (.pl) program illustrating the problem, if possible. If you believe that you have a program patch, and offer to share it as a PR, we may give the go-ahead. Unsolicited PRs may be closed without further action.
COPYRIGHT & LICENSE
Copyright (c) 2011 Simon Cozens.
Copyright (c) 2020-2021 Phil M Perry.
This program is released under the following license: Perl, GPL