NAME
Prima::Drawable::Glyphs - helper routines for bi-directional text input and complex scripts output
SYNOPSIS
use Prima;
$::application-> begin_paint;
$::application-> text_shape_out('אפס123', 0,0);
123ספא
DESCRIPTION
The class implements an abstraction over a set of glyphs that can be rendered to represent text strings. Objects of the class are created and returned from Prima::Drawable::text_shape
calls, see more in "text_shape" in Prima::Drawable. An object is a blessed array reference that can contain either two or four packed arrays with 16-bit integers, representing, correspondingly, a set of glyph indexes, a set of character indexes, a set of glyph advances, and a set of glyph position offsets per glyph. Additionally, the class implements several sets of helper routines that aim to address common tasks when displaying glyph-based strings.
Structure
Each array is an instance of Prima::array
, an effective plain memory structure that provides standard perl interface over a string scalar filled with fixed-width integers.
The following methods provide read-only access to these arrays:
- glyphs
-
Contains set of unsigned 16-bit integers where each is a glyph number corresponding to the font that was used when shaping the text. These glyph numbers are only applicable to that font. Zero is usually treated as a default glyph in vector fonts, when shaping cannot map a character; in bitmap fonts this number it is usually a
defaultChar
.This array is recognized as a special case when is set to
text_out
orget_text_width
, that can process it without other arrays. In this case, no special advances and glyph positions are taken into the account though.Each glyph is not necessarily mapped to a character, and quite often it is not, even in english left-to-right texts. F ex character combinations like
"ff"
,"fi"
,"fl"
can be mapped as single ligature glyphs. When right-to-left, RTL, text direction is taken into the account, the glyph positions may change, too. Seeindexes
below that addresses mapping of glyph to characters. - indexes
-
Contains set of unsigned 16-bit integers where each is an offset corresponding to the text was used in shaping. Each glyph position thus points to a first character in the text that maps to the glyph.
There can be more than one characters per glyphs, such as the above example with a
"ff"
ligature. There can also be cases with more than one characher per more than one glyph, such is the case in indic scripts. In these cases it is easier to operate neither by character offsets nor glyph offsets, but rather by clusters, where each is an individual syntax unit that contains one or more characters perl one or more glyphs.In addition to the text offset, each index value can be flagged with a
to::RTL
bit, signifying that the character in question has RTL direction. This is not necessarily semitic characters from RTL languages that only have that attributes set; spaces in these languages are normally attributed the RTL bit too, sometimes also numbers. Use of explicit direction control characters from U+20XX block can result in any character being assigned or not assigned the RTL bit.The array has an extra item added to its end, the length of the text that was used in the snaping. This helps for easy calculation of cluster length in characters, especially of the last one, where difference between indexes is, basically, the cluster length.
The array is not used for text drawing or calculation, but only for conversion between character, glyph, and cluster coordinates (see
Coordinates
below). - advances
-
Contains set of unsigned 16-bit integers where each is a pixel distance of how much space the glyph occupies. Where the advances array is not present, or filled by
advances
options intext_shape
, it is basically a sum of a, b, and c widths of a glyph. However there are cases when depending on shaping input, these values can differ.One of those cases is combining graphemes, where text consisting of two characters,
"A"
and combining grave accent U+300 should be drawn as a single "À" symbol, but font doesn't have that single glyph but rather two individual glyphs"A"
and <"`">. There, where grave glyph has its own advance for standalone usage, in this case it should be ignored though, and that is achieved by setting the advance of the"`"
to zero.The array content is respected by
text_out
andget_text_width
, and its content can be changed at will to produce gaps in the text quite easily. F exPrima::Edit
uses that to display tab characters as spaces with 8x advance. - positions
-
Contains set of pairs of signed 16-bit integers where each is a X and Y pixel offset for each glyph. Like in the previous example with the "À" symbol, the grave glyph
"`"
may be positioned differently on the vertical f ex on "À" and "à" graphemes.The array is respected by
text_out
(but not byget_text_width
).
Coordinates
In addition to natural character coordinates, where each index is an offset that can be directly used in substr
perl function, this class offers two additional coordinate systems that help abstract the object data for display and navigation.
The glyph coordinate is a rather straighforward copy of the character coordinates, where each number is an offset in the glyphs
array. Similarly, these offsets can be used to address individual glyphs, indexes, advances, and positions. However these are not easy to use when one needs, for example, to select a grapheme with a mouse, or break set of glyphs in such a way so that a grapheme is not broken. These can be managed easier in the cluster coordinate system.
The cluster coordinates are virtually superimposed set of offset where each correspond to a set of one or more characters displayed by a one or more glyphs. Most useful functions below operate in this system.
Selection
Practically, most useful coordinates that can be used for implementing selection is either character or cluster, but not glyphs. The charater-based selections makes trivial extraction or replacement of the selected text, while the cluster-based makes it easier to manipulate (f ex with Shift- arrow keys) the selection itself.
The class supports both, by operatin on selection maps or selection chunks, where each represent same information but in different ways. For example, consider embedded number in a bidi text. For the sake of clarity I'll use latin characters here. Let's have a text scalar containing these characters:
ABC123
where ABC is right-to-left text, and which, when rendered on screen, should be displayed as
123CBA
(and index array is (3,4,5,2,1,0) ).
Next, the user clicks the mouse between A and B (in text offset 1), drags the mouse then to the left, and finally stops between characters 2 and 3 (text offset 4). The resulting selection then should not be, as one might naively expect, this:
123CBA
__^^^_
but this instead:
123CBA
^^_^^_
because the next character after C is 1, and the range of the selected sub-text is from characters 1 to 4.
The class offers to encode such information in a map, i.e. array of integers 1,1,0,1,1,0
, where each entry is either 0 or 1 depending on whether the cluster is or is not selected. Alternatively, the same information can be encoded in chunks, or RLE sets, as array 0,2,1,2,1
, where the first integer signifies number of non-selected clusters to display, the second - number of selected clusters, the third the non-selected again, etc. If the first character belongs to the selected chunk, the first integer in the result is set to 0.
Bidi input
When sending input to a widget in order to type in text, the otherwise trivial case of figuring out at which positing the text should be inserted (or removed, for that matter), becomes interesting when there are characters with mixed direction.
F ex it is indeed trivial, when the latin text is AB
, and the cursor is positioned between A
and B
, to figure out that whenever the user types C
, the result should become ACB
. Likewise, when the text is LTR and both text and input is arabic, the result is the same. However when f.ex. the text is A1
, that is displayed as 1A
because of RTL shaping, and the cursor is positioned between 1
(LTR) and A
(RTL), it is not clear whether that means the new input should be appended after 1
and become A1C
, or after A
, and become, correspondingly, AC1
.
There is no easy solution for this problem, and different programs approach this differently, and some go as far as to provide two cursors for both directions. The class offers its own solution that uses some primitive heuristics to detect whether cursor belongs to the left or to the right glyph. This is the area that can be enhanced, and any help from native users of RTL languages can be greatly appreciated.
API
- abc $CANVAS, $INDEX
-
Returns a, b, c metrics from the glyph
$INDEX
- advances
-
Read-only accessor to advances array, see Structure above.
- clone
-
Clones the object
- cluster2glyph $FROM, $LENGTH
-
Maps a range of clusters starting with
$FROM
with size$LENGTH
into the corresponding range of glyphs. Undefined$LENGTH
calculates the range from$FROM
till the object end. - cluster2index $CLUSTER
-
Returns character offset of the first character in cluster
$CLUSTER
.Note: result may contain
to::RTL
flag. - cluster2range $CLUSTER
-
Returns character offset of the first character in cluster
$CLUSTER
and how many characters are there in the cluster. - clusters
-
Returns array of integers where each is a first character offsets per cluster.
- cursor2offset $AT_CLUSTER, $PREFERRED_RTL
-
Given a cursor positioned next to the cluster
$AT_CLUSTER
, runs simple heuristics to see what character offset it corresponds to.$PREFERRED_RTL
is used when object data are not enough.See "Bidi input" above.
- def $CANVAS, $INDEX
-
Returns d, e, f metrics from the glyph
$INDEX
- get_box $CANVAS
-
Return box metrics of the glyph object.
- get_sub $FROM, $LENGTH
-
Extracts and clones a new object that constains data from cluster offset
$FROM
, with cluster length$LENGTH
. - get_sub_box $CANVAS, $FROM, $LENGTH
-
Calculate box metrics of a glyph string from the cluster
$FROM
with size$LENGTH
. - get_sub_width $CANVAS, $FROM, $LENGTH
-
Calculate pixel width of a glyph string from the cluster
$FROM
with size$LENGTH
. - get_width $CANVAS, $WITH_OVERHANGS
-
Return width of the glyph objects, with overhangs if requested.
- glyph2cluster $GLYPH
-
Return the cluster that contains
$GLYPH
. - glyphs
-
Read-only accessor to glyph indexes, see Structure above.
- glyph_lengths
-
Returns array where each glyph position is set to a number showing how many glyphs the cluster occupies at this position
- index2cluster $INDEX
-
Returns the cluster that contains the character offset
$INDEX
. - indexes
-
Read-only accessor to indexes, see Structure above.
- index_lengths
-
Returns array where each glyph position is set to a number showing how many characters the cluster occupies at this position
- left_overhang
-
First integer from the
overhangs
result. - log2vis
-
Returns a map of integers where each character position corresponds to a glyph position. The name is a rudiment from pure fribidi shaping, where
log2vis
andvis2log
were mapper functions with the same functionality. - n_clusters
-
Calculates how many clusters the object contains.
- new @ARRAYS
-
Create new object. Not used directly, but rather from inside
text_shape
calls. - new_empty
-
Create new empty object.
- overhangs
-
Calculates two pixel widths for overhangs in the beginning and in the end of the glyph string. This is used in emulation of a
get_text_width
call with theto::AddOverhangs
flag. - positions
-
Read-only accessor to positions array, see Structure above.
- reverse
-
Creates a new object that has all arrays reversed. User for calculation of pixel offset from the right end of a glyph string.
- right_overhang
-
Second integer from the
overhangs
result. - selection2range $CLUSTER_START $CLUSTER_END
-
Converts cluster selection range into text selection range
- selection_chunks $START, $END
-
Calculates a set of chunks of texts, that, given a text selection from positions
$START
to$END
, represent each either a set of selected and non-selected clusters. - selection_diff $OLD, $NEW
-
Given set of two chunk lists, in format as returned by
selection_chunks
, calculates the list of chunks affected by the selection change. Can be used for efficient repaints when the user interactively changes text selection, to redraw only the changed regions. - selection_map $START, $END
-
Same as
selection_chunks
, but instead of RLE chunks returns full array for each cluster, where each entry is a boolean value corresponding to whether that cluster is to be displayed as selected, or not. - selection_walk $CHUNKS, $FROM, $TO = length, $SUB
-
Walks the selection chunks array, returned by
selection_chunks
, between$FROM
and$TO
clusters, and for each chunk calls the provided$SUB->($offset, $length, $selected)
, where each call contains 2 integers to chunk offset and length, and a boolean flag whether the chunk is selected or not.Can be also used on a result of
selection_diff
, in which case$selected
flag is irrelevant. - sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
-
Optimized version of
$CANVAS->text_out( $self->get_sub($FROM, $LENGTH), $X, $Y )
. - sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
-
Optimized version of
$CANVAS->text_wrap( $self->get_sub($FROM, $LENGTH), $WIDTH, $OPT, $TABS )
. The result is also converted to chunks. - text_length
-
Returns the length of the text that was shaped and that produced the object.
- x2cluster $CANVAS, $X, $FROM, $LENGTH
-
Given sub-cluster from
$FROM
with size$LENGTH
, calculates how many clusters would fit in width$X
.
AUTHOR
Dmitry Karasik, <dmitry@karasik.eu.org>.
SEE ALSO
examples/bidi.pl