NAME
uninames - show selected Unicode character descriptions
SYNOPSIS
uninames [options] criteria
Options must use double-dash form only:
--version print version information
--help this message
--man full manpage
--word patterns wrapped with \b ... \b
--bmp restrict matches to Basic Multilingual Plane
--astral restrict matches to above the Basic Multilingual Plane
--debug print debugging and exit
Args are otherwise patterns, all of which much be matched. Each match is case-insensitive if it contains any lower-case letters. A single leading minus is a negated match.
eg: uninames greek alpha
eg: uninames GREEK ALPHA tonos
eg: uninames LATIN LETTER -greek -WITH
DESCRIPTION
The uninames program searches the Unicode NamesList.txt file for character descriptions, showing all entries matching the selection criteria that were given as program arguments. Without any arguments, the entire file is displayed.
A typical entry looks like this:
007C VERTICAL LINE
= vertical bar
* used in pairs to indicate absolute value
x (latin letter dental click - 01C0)
x (hebrew punctuation paseq - 05C0)
x (divides - 2223)
x (light vertical bar - 2758)
So its official name is in all caps, but later parts of the description are in mixed case--usually lowercase. You can use this property to restrict what part of the entry you do or not match. Note also that code points are given in hex if you care to match them.
Although typical arguments are words, each argument is a regular expression. Word boundaries enclose each argument (only) if the --words option is given. If any lowercase letters occur in a given argument (except for regex escapes), that argument will be matched case insensitively.
Each pattern is compiled with /x
, /m
, and /s
. Each entry in the names file is examined in succession, and if all criteria match, that entry printed out prefixed with its literal character and the code point's decimal value right before its hex value.
Output is piped through the user's pager, or more if none is set.
EXAMPLES
Find entries matching both "greek" and "alpha", case insensitively:
$ uninames greek alpha
Find entries matching both "GREEK" and "ALPHA" case sensitively, and "tonos" case insensitively:
$ uninames GREEK ALPHA tonos
Find entries matching both "LATIN" and "LETTER case sensitively, but not matching "greek" case insensitively nor "WITH" case sensitively.
$ uninames LATIN LETTER -greek -WITH
Find entries whose official name ends with "ETH" and are from the Basic Multilingual Plane:
$ uninames --bmp "ETH$"
Find entries containing "latin" case insensitively anywhere in the description at word boundaries, and which are not from the Basic Multilingual Plane:
$ uninames --word --astral latin
Find entries with aliased names, except for those named "<control>":
$ uninames '^ \s+ = \s+' -'<control>'
Find entries marked as used in French:
$ uninames '\* .* French'
Find entries marked as used in either Spanish or Portuguese:
$ uninames '^ \s+ \* .* (Spanish|Portuguese)'
FILES
$privlib/unicore/NamesList.txt
PROGRAMS
less(1)
BUGS
It's hard to remember to type a double-dash for options.
If your system's idea of valid Unicode lags behind your font's, you may have to call less yourself, passing it -r so it displays the real characters instead of "<U+XXXX>".
May be subclever in inferring case sensitivity.
SEE ALSO
unichars, uniprops, perlunicode
Tim Bray's article discussing the astral planes http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
AUTHOR
Tom Christiansen <tchrist@perl.com>