NAME
Unicode::Util - Unicode-aware versions of built-in Perl functions
VERSION
This document describes Unicode::Util version 0.03.
SYNOPSIS
use Unicode::Util qw( graph_length code_length byte_length );
# grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
my $grapheme = "\x{44E}\x{301}";
say graph_length($grapheme); # 1
say code_length($grapheme); # 2
say byte_length($grapheme); # 4
DESCRIPTION
This module provides additional versions of Perl’s built-in functions, tailored to work on three different units:
graph: Unicode extended grapheme clusters (graphemes)
code: Unicode codepoints
byte: 8-bit bytes (octets)
This is an early release and this module is likely to have major revisions. Only the length
-, chop
-, and reverse
-functions are currently implemented. See the "TODO" section for planned future additions.
FUNCTIONS
length
- graph_length($string)
-
Returns the length in graphemes of the given string. This is likely the number of “characters” that many people would count on a printed string, plus non-printing characters.
- code_length($string)
-
Returns the length in codepoints of the given string. This is likely the number of “characters” that many programmers and programming languages would count in a string.
- byte_length($string)
-
Returns the length in bytes of the given string encoded as UTF-8. This is the number of bytes that many computers would count when storing a string.
chop
These do not modify the original value, unlike the built-in chop
.
- graph_chop($string)
-
Returns the given string with the last grapheme chopped off.
- code_chop($string)
-
Returns the given string with the last codepoint chopped off.
reverse
TODO
Evaluate the following core Perl functions and operators for the potential addition to this module.
split
, substr
, index
, rindex
, eq
, ne
, lt
, gt
, le
, ge
, cmp
AUTHOR
Nick Patch <patch@cpan.org>
COPYRIGHT AND LICENSE
© 2011–2012 Nick Patch
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.