NAME
App::Greple::charcode - greple module to annotate unicode character data
SYNOPSIS
greple -Mcharcode [ module option -- ] [ command option ] ...
COMMAND OPTION
--no-annotate do not print annotation
--[no-]align align annotations
--align-all align to the same column for all lines
--align-side align to the longest line
PATTERNS
--composite find composite character (combining character sequence)
--precomposed find precomposed character
--combined find both composite and precomposed characters
--outstand find --combined and non-ASCII characters
--dt=type specify decomposition type
--surrogate find character in UTF-16 surrogate pair range
--outstand find non-ASCII combining characters
-p/-P prop find \p{prop} or \P{prop} characters
--ansicode find ANSI terminal control sequences
MODULE OPTION
--column[=#] display column number
--visible[=#] display character name
--char[=#] display character itself
--width[=#] display width
--utf8[=#] display UTF-8 encoding
--utf16[=#] display UTF-16 encoding
--code[=#] display Unicode code point
--name[=#] display character name
--nfd[=#] display Unicode Normalization Form D
--nfc[=#] display Unicode Normalization Form C
--nfkd[=#] display Unicode Normalization Form KD
--nfkc[=#] display Unicode Normalization Form KC
--split[=#] put annotattion for each character
--alignto[=#] align annotation to #
--config KEY[=VALUE],...
greple -Mcc [ module option -- ] [ command option ] ...
-Mcc alias module for -Mcharcode
VERSION
Version 0.9909
DESCRIPTION
Greple module -Mcharcode (or -Mcc for short) displays
information about the matched characters. It can visualize Unicode
zero-width combining or hidden characters, which can be useful for
examining text containing visually indistinguishable or imperceptible
elements.
The following output, retrieved from this document for non-ASCII
characters (\P{ASCII}), shows that the character \N{VARIATION SELECTOR-15} is included after the copyright character. The same
character, presumably left over from editing, is also included after a
normal ASCII t character.
$ greple -Mcharcode '\P{ASCII}' charcode.pm
┌─── 12 \x{fe0e} \N{VARIATION SELECTOR-15}
│ ┌─ 14 \x{a9} \N{COPYRIGHT SIGN}
│ ├─ 14 \x{fe0e} \N{VARIATION SELECTOR-15}
Copyright︎ ©︎ 2025 Kazumasa Utashiro.
The nasal sound of the K line (カ行) in Japanese is sometimes represented by adding a semivoiced dot to the K line character, and since Unicode does not define a corresponding character, it is represented by combining the original character with a combining character. This module allows you to see how it is done.
┌───────── 0 \x{30ab} \N{KATAKANA LETTER KA}
├───────── 0 \x{309a} \N{COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK}
│ ┌─────── 2 \x{30ad} \N{KATAKANA LETTER KI}
│ ├─────── 2 \x{309a} \N{COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK}
│ │ ┌───── 4 \x{30af} \N{KATAKANA LETTER KU}
│ │ ├───── 4 \x{309a} \N{COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK}
│ │ │ ┌─── 6 \x{30b1} \N{KATAKANA LETTER KE}
│ │ │ ├─── 6 \x{309a} \N{COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK}
│ │ │ │ ┌─ 8 \x{30b3} \N{KATAKANA LETTER KO}
│ │ │ │ ├─ 8 \x{309a} \N{COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK}
カ゚キ゚ク゚ケ゚コ゚
COMMAND OPTIONS
-
--annotate, --no-annotate
Print annotation or not. Enabled by default, so use
--no-annotateto disable it. -
--[no-]align
Align annotation or not. Default true.
-
--align-all
Align to the same column for all lines
-
--align-side
Align to the longest line length, regardless of match position.
PATTERN OPTIONS
If multiple patterns are given to greple, it normally prints only
the lines that match all of the patterns. However, for the purposes
of this module, it is desirable to display lines that match any of
them, so the --need=1 option is specified by default.
If multiple patterns are specified, the strings matching each pattern will be displayed in a different color.
-
--composite
Search for composite characters (combining character sequence) composed of base and combining characters.
-
--precomposed
Search for precomposed characters (
\p{Dt=Canonical}). -
--combined
Find both composite and precomposed characters.
-
--dt=type, --decomposition-type=type
Specifies the
Decomposition_Type. It can take three values:Canonical,Non_Canonical(NonCanon), orNone. -
--outstand
Matches outstanding characters, those are non-ASCII combining characters.
-
--surrogate
Matches to characters in UTF-16 surragate pair range (U+10000 to U+10FFFF).
-
-p prop, -P prop
Short cut for
-E '\p{prop}'and-E '\P{prop}'.You will not be able to use greple's
-poption, but it probably won't be a problem. If you must use it, use--pargraph. -
--ansicode
Search ANSI terminal control sequence. Automatically disables
nameandcodeparameter and activatesvisible. Colorized output is disabled too.To be precise, it searches for CSI Control sequences defined in ECMA-48. Pattern is defined as this.
(?x) # see ECMA-48 5.4 Control sequences (?: \e\[ | \x9b ) # csi [\x30-\x3f]* # parameter bytes [\x20-\x2f]* # intermediate bytes [\x40-\x7e] # final byte
MODULE OPTIONS and PARAMS
Module-specific options are specified between -Mcharcode and --.
greple -Mcharcode --config width,name=0 -- ...
Parameters can be set in two ways, one using the --config option
and the other using dedicated options. See the "CONFIGURATION"
section for more information.
-
--config=params
Set configuration parameters.
-
column
-
--column[=#]
Show column number. Default
1. -
visible
-
--visible[=#]
Display invisible characters in a visible string representation. Default
0. -
char
-
--char[=#]
Show the character itself. Default
0. -
width
-
--width[=#]
Show the width. Default
0. -
utf8
-
--utf8[=#]
Show the UTF-8 encoding in hex. Default
0. -
utf16
-
--utf16[=#]
Show the UTF-16 encoding in hex. Default
0. -
code
-
--code[=#]
Show the character code point in hex. Default
1. -
nfd, nfc, nfkd, nfkc
-
--nfd[=#], --nfc[=#], --nfkd[=#], --nfkc[=#]
Show the Unicode Normalization Form D, C, KD and KC. See Unicode::Normalize.
-
name
-
--name[=#]
Show the Unicode name of the character. Default
1. -
split
-
--split[=#]
If a pattern matching multiple characters is given, annotate each character independently.
-
alignto=column
-
--alignto=column
Align annotation messages. Defaults to
1, which aligns to the rightmost column;0means no align; if a value of2or greater is given, it aligns to that numbered column.column can be negative; if
-1is specified, align to the same column for all lines. If-2is specified, align to the longest line length, regardless of match position.
CONFIGURATION
Configuration parameters can be set in several ways.
MODULE START FUNCTION
The start function of a module can be specified at the same time as the module declaration.
greple -Mcharcode::config(alignto=0)
greple -Mcharcode::config=alignto=80
PRIVATE MODULE OPTION
Module-specific options are specified between -Mcharcode and --.
greple -Mcharcode --config alignto=80 -- ...
greple -Mcharcode --alignto=80 -- ...
GENERIC MODULE OPTION
Module-specific ---config option can be called by normal command
line option --charcode::config.
greple -Mcharcode --charcode::config alignto=80 ...
EXAMPLES
HOMOGLYPH
greple -Mcc -P ASCII --align-side --cm=S t/homoglyph
BOX DRAWINGS
perldoc -m App::ansicolumn::Border | greple -Mcc --code -- --outstand --mc=10,
AYNU ITAK
greple -Mcc --outstand --split t/ainu.txt
INSTALL
cpanm -n App::Greple::charcode
SEE ALSO
LICENSE
Copyright︎ ©︎ 2025 Kazumasa Utashiro.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR
Kazumasa Utashiro