NAME

Unicode::Confuse - Identify and replace Unicode confusables

SYNOPSIS

use utf8;
use Unicode::Confuse ':all';
if (confusable ('ρ')) {
    my $canonical = canonical ('ρ');
    print "'ρ' is confusable with $canonical.\n";
    my @similar = similar ($canonical);
    print "$canonical is also confusable with @similar.\n";
}

produces output

'ρ' is confusable with p.
p is also confusable with p ρ ϱ р ⍴ ⲣ ｐ 𝐩 𝑝 𝒑 𝓅 𝓹 𝔭 𝕡 𝖕 𝗉 𝗽 𝘱 𝙥 𝚙 𝛒 𝛠 𝜌 𝜚 𝝆 𝝔 𝞀 𝞎 𝞺 𝟈.

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents version 0.05 of Unicode-Confuse corresponding to git commit 4a97ea6b65f148a559b21ebde80b189257d16b1c released on Thu Apr 29 11:49:01 2021 +0900.

This Perl module incorporates Unicode Security Mechanisms for UTS #39 version 13.0.0 dated 2020-02-13, 01:38:49 GMT, copyright © 2020 Unicode®, Inc.. For terms of use, see http://www.unicode.org/terms_of_use.html.

DESCRIPTION

This module offers functions for dealing with Unicode "confusables", characters which look similar to one another but are represented by different Unicode code points.

FUNCTIONS

canonical

my $canonical = canonical ($c);

If $c is a confusable, give the canonical form of $c. If $c is already the canonical form of itself, return $c. If $c is not a confusable, this returns the undefined value. "Canonical" here just means the character which is used as a representative of the group of confusables in the "Unicode data files".

confusable

if (confusable ($c)) {
    # do something.
}

This returns a true or false value depending on whether $c is a confusable. This matches $c against a large regex in Unicode::Confuse::Regex.

similar

my @similar = similar ('p');

Return a list of confusables which are similar to the given input. If the input is not a confusable, an empty list is returned.

The first character in @similar is the canonical form, and the remaining characters are the other confusables associated with that canonical form. These remaining characters, if more than one, are sorted by code point.

Example: obfuscate text

This example obfuscates strings by substituting confusable letters with substitutes picked at random from the confusable data for the letters.

use utf8;
use Unicode::Confuse ':all';

sub obfuscate
{
    for (@_) {
        my @letters = split '', $_;
        my $out = '';
        my $ok;
        for my $letter (@letters) {
            my @similar = similar ($letter);
            if (@similar) {
                $ok = 1;
                my $n = scalar (@similar);
                my $r = int (rand ($n));
                $out .= $similar[$r];
            }
            else {
                $out .= $letter;
            }
        }
        if (! $ok) {
            print "No confusables in '$_'.\n";
        }
        else {
            print "$_ -> $out\n";
        }
    }
}

obfuscate ('paypal', '月火水木金土日');

produces output

paypal -> 𝘱𝒂𝐲𝓅𝖺𝜤
月火水木金土日 -> 月火水木金⼠⽇

(This example is included as obfuscate.pl in the distribution.)

DEPENDENCIES

File::Slurper: This is used by the parsing module Unicode::Confuse::Parse.
JSON::Parse: This is used to parse the JSON-formatted file of confusables distributed with the module.

BUGS

Unicode specifications: This does not even attempt to replicate the Unicode requirements for software for handling confusables. In other words, this Perl module makes no claim whatsoever to be "An implementation claiming conformance to this specification" as described in the text of the "Unicode Consortium specification".
Data quality: The data in the Unicode confusables file is of mixed quality, with nearly identical or indistinguishable characters muddled together with things which are clearly quite different from one another.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.

To install Unicode::Confuse, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Unicode::Confuse

CPAN shell

perl -MCPAN -e shell
install Unicode::Confuse

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

VERSION

DESCRIPTION

FUNCTIONS

canonical

confusable

similar

Example: obfuscate text

DEPENDENCIES

BUGS

SEE ALSO

In this distribution

Unicode Consortium information

AUTHOR

COPYRIGHT & LICENCE

NAME

SYNOPSIS

VERSION

DESCRIPTION

FUNCTIONS

canonical

confusable

similar

Example: obfuscate text

DEPENDENCIES

BUGS

SEE ALSO

In this distribution

Unicode Consortium information

AUTHOR

COPYRIGHT & LICENCE

Module Install Instructions