NAME
Lingua::JA::Gairaigo::Fuzzy - variant spellings of foreign words in Japanese
SYNOPSIS
use utf8;
use Lingua::JA::Gairaigo::Fuzzy 'same_gairaigo';
my $x = 'メインフレーム';
my $y = 'メーンフレーム';
if (same_gairaigo ($x, $y)) {
print "$x and $y may be the same word.\n";
}
produces output
メインフレーム and メーンフレーム may be the same word.
(This example is included as synopsis.pl in the distribution.)
DESCRIPTION
Given two Japanese gairaigo words (katakana words), guess whether they are the same word. Japanese language is somewhat inconsistent in how it writes foreign loan words. For example "motor" can be モーター or モータ from the English "motor", or モートル from Dutch "motor". This module attempts to guess whether two loanwords refer to the same thing.
FUNCTIONS
same_gairaigo
my $same = same_gairaigo ('メイン', 'メーン');
This guesses whether the two words are the same. It catches things like addition and removal of "ー", "・", "ッ", mixing of elements such as "ティ", "テー", "テイ", and "テ", or combinations like "コウ" and "コー". If the two words appear to be the same, it returns a true value. If the two words appear not to be the same, it returns a false value.
As of 0.08, the exact checks this makes are not documented, so please view the source code to find out the details.
DEPENDENCIES
- Lingua::JA::Moji
-
"kana2romaji" in Lingua::JA::Moji is used to compute whether a particular word ends in one vowel or another.
- Text::Fuzzy
-
Text::Fuzzy is used to compare the two katakana words to see what similarities there may be between them.
HISTORY
This module started as a script to help with the checking of duplicate entries for the online Japanese dictionaries by Jim Breen, see http://www.edrdg.org.
Because this module is intended to deal with natural language, it does not guarantee to find a correct answer. Bug reports containing test cases are very much appreciated.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2013-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.