NAME
Lingua::LO::Transform::Syllables - Segment Lao or mixed-script text into syllables.
FUNCTION
This implements a purely regular expression based algorithm to segment Lao text into syllables, based on the one described in PHISSAMAY et al: Syllabification of Lao Script for Line Breaking.
METHODS
new
new( text => $text, ... )
The constructor takes hash-style named arguments. The only one defined so far is text
whose value is obviously the text to be segmented.
Note that text is passed through "NFC" in "Unicode::Normalize" first to obtain the Composed Normal Form. In pure Lao text, this affects only the decomposed form of LAO VOWEL SIGN AM that will be transformed from U+0EB2
,U+0ECD
to U+0EB3
.
get_syllables
get_syllables()
Returns a list of Lao syllables found in the text passed to the constructor. If there are any blanks, non-Lao parts etc. mixed in, they will be silently dropped.
get_fragments
get_fragments()
Returns a complete segmentation of the text passed to the constructor as an array of hashes. Each hash has two keys: