NAME
Unicode::Properties - find out what properties a character has
SYNOPSIS
use utf8;
use Unicode::Properties 'uniprops';
my @prop_list = uniprops ('☺'); # Unicode smiley face
print "@prop_list\n";
produces output
Any Assigned Common InMiscellaneousSymbols
(This example is included as synopsis.pl in the distribution.)
You can then use, for example, \p{InMiscellaneousSymbols}
to match this character in a regular expression.
VERSION
This documents Unicode::Properties version 0.07 corresponding to git commit c1071fba77751ea891c887d26d8e3ea9ce91f631 released on Mon Jan 30 09:30:10 2017 +0900.
DESCRIPTION
Unicode::Properties provides a way to go from a character to its list of properties.
FUNCTIONS
uniprops
my @prop_list = uniprops ('☺'); # Unicode smiley face
Given a character, returns a list of properties which the character has. This works by testing its argument against \p{}
regular expressions for every possible category the module knows about, so it is not an efficient method.
use Unicode::Properties 'uniprops';
print join (',',uniprops('2')), "\n";
produces output
ASCII,Any,Assigned,Common,IDContinue,InBasicLatin
(This example is included as univer.pl in the distribution.)
matchchars
my @matching = matchchars ($property);
This returns a list of all the characters which match a particular property. If $property
is not found in the list of possible Unicode properties, it treats it as a regular expression.
It can also return an array reference:
use utf8;
use FindBin '$Bin';
use Unicode::Properties ':all';
my $type = 'InCJKUnifiedIdeographs';
my $matching = matchchars ($type);
printf "There are %d characters of type %s.\n", scalar (@$matching), $type;
produces output
There are 20992 characters of type InCJKUnifiedIdeographs.
(This example is included as matchchars.pl in the distribution.)
VARIABLES
$unicode_version
$unicode_version
is the version of Unicode supplied with your version of Perl, taken from "Unicode::UCD". To override the Unicode version and get properties for a different version of Unicode, set this to a desired value.
EXPORTS
"uniprops" and "matchchars" are exported on demand. A tag :all
exports all the functions of the module.
DEPENDENCIES
- Unicode::UCD
-
Unicode::UCD (Unicode Character Database) is used to find the version of Unicode which your Perl supplies.
BUGS
- Data source
-
This module uses a list taken from the "perlunicode" documentation. It should use Perl's internals or the Unicode files to get the list.
- Outdated data
-
As of version 0.07, the Unicode data dates from an older version of Perl.
- Perl & Unicode version
-
Depending on your Perl and Unicode version, you'll get different results. For example "Balinese" was added in Unicode version 5.0.0, so if you are using Perl 5.8.8 unpatched, your Unicode version is 4.1.0 so you won't get "Balinese" in the results list.
Also, I don't know the behaviour of Unicode versions other than 4.1.0 and 5.0.0, so this module only covers those two. I couldn't get Perl 5.8.5 to install on my computer, so I've set the minimum version to 5.8.8 for this module.
SEE ALSO
Other CPAN modules
- "uniprops" in Unicode::Tussle
-
This script was written because the author (Tom Christiansen, <TCHRIST>) was dissatisfied with Unicode::Properties. Unfortunately, it uses the same method as this module, of parsing the Perl documentation to get the information. The last time I tested it, it only worked for Perl versions 5.12 or 5.14, but that was about three years ago.
Information about Perl and Unicode
- Perl Unicode documentation
-
See perlunicode for Unicode documentation, and perluniprops for details of all the different properties. There is also a tutorial in perlunitut, and some more advice in perlunifaq.
- Other Unicode and Perl information
-
Tutorial on Perl and Unicode is a tutorial for people new to Unicode and Perl.
Get the Unicode value of a character in Perl explains how to get the Unicode value of a single character.
What characters match a regular expression? is a Perl script which shows what single characters match a particular regular expression, like
\s
or\p{InCJKUnifiedIdeographs}
.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2011-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.