NAME

Data::Kanji::Kanjidic - parse the "kanjidic" kanji data file

SYNOPSIS

use Data::Kanji::Kanjidic 'parse_kanjidic';
my $kanji = parse_kanjidic ('/path/to/kanjidic');
for my $k (keys %$kanji) {
    print "$k has radical number $kanji->{C}.\n";
}

FUNCTIONS

parse_kanjidic

my $kanjidic = parse_kanjidic ('kanjidic');

The input is the name of the file. The output is a hash reference. The keys of the hash reference are kanji, encoded as Unicode. The values of the hash reference are entries corresponding to the kanji in the keys. Each value represents one line of Kanjidic. Each is a hash reference, with the keys described in "parse_entry".

This function assumes that the kanjidic file is encoded using the EUC-JP encoding.

parse_entry

my %values = parse_entry ($line);

Parse one line of kanjidic.

The possible keys and values of the returned hash are as follows. Values are scalars unless otherwise mentioned.

B

Bushu (radical as defined by the Nelson kanji dictionary)

C

Classic radical (the usual radical)

DB

Japanese for Busy People textbook numbers

DC

The index numbers used in "The Kanji Way to Japanese Language Power" by Dale Crowley.

DF

"Japanese Kanji Flashcards", by Max Hodges and Tomoko Okazaki

DG

The index numbers used in the "Kodansha Compact Kanji Guide"

DH

The index numbers used in the 3rd edition of "A Guide To Reading and Writing Japanese" edited by Kenneth Hensall et al.

DJ

The index numbers used in the "Kanji in Context" by Nishiguchi and Kono.

DK

The index numbers used by Jack Halpern in his Kanji Learners Dictionary

DM

The index numbers from the French-language version of "Remembering the kanji"

DO

The index numbers used in P.G. O'Neill's Essential Kanji

DR

the codes developed by Father Joseph De Roo, and published in his book "2001 Kanji" (Bonjinsha)

DS

The index numbers used in the early editions of "A Guide To Reading and Writing Japanese" edited by Florence Sakade.

DT

The index numbers used in the Tuttle Kanji Cards, compiled by Alexander Kask.

E

The numbers used in Kenneth Henshall's kanji book

F

Frequency of kanji

G

Year of elementary school

H

Number in Jack Halpern dictionary

I

The Spahn-Hadamitzky book number

IN

The Spahn-Hadamitzky kanji-kana book number

J

Japanese proficiency test level of the kanji

K

The index in the Gakken Kanji Dictionary (A New Dictionary of Kanji Usage);

L

Code from "Remembering the Kanji" by James Heisig

MN

Morohashi index number

MP

Morohashi volume/page

N

Nelson code from original Nelson dictionary

O

The numbers used in P.G. O'Neill's "Japanese Names"

This may take multiple values, so the value is an array reference.

P

SKIP code

Q

Four-corner code for the kanji

This may take multiple values, so the value is an array reference.

S

Stroke count

This may take multiple values, so the value is an array reference.

T

SPECIAL

U

Unicode

V

Nelson code from the "New Nelson" dictionary

This may take multiple values, so the value is an array reference.

W

Korean pronunciation

This may take multiple values, so the value is an array reference.

use Data::Kanji::Kanjidic 'parse_kanjidic';
use Lingua::KO::Munja ':all'; # 강남스타일
binmode STDOUT, ":utf8";
my $kanji = parse_kanjidic ($ARGV[0]);
for my $k (sort keys %$kanji) {
    my $w = $kanji->{$k}->{W};
    if ($w) {
        my @h = map {'"' . hangul2roman ($_) . '"'} @$w;
        print "$k is Korean ", join (", ", @h), "\n";
    }
}

X

Cross reference

XDR

De Roo cross-reference

This may take multiple values, so the value is an array reference.

XH

Cross-reference.

This may take multiple values, so the value is an array reference.

XI

Cross-reference.

XJ

Cross-reference.

This may take multiple values, so the value is an array reference.

XN

Nelson cross-reference

This may take multiple values, so the value is an array reference.

XO

Cross-reference.

Y

Pinyin pronunciation

This may take multiple values, so the value is an array reference.

ZBP

MISCLASSIFICATIONrp

This may take multiple values, so the value is an array reference.

ZPP

MISCLASSIFICATIONpp

This may take multiple values, so the value is an array reference.

ZRP

MISCLASSIFICATIONrp

This may take multiple values, so the value is an array reference.

ZSP

MISCLASSIFICATIONsp

This may take multiple values, so the value is an array reference.

kokuji

This has a true value (1) if the character is marked as a "kokuji" in Kanjidic.

english

This contains an array reference to the English-language meanings given in Kanjidic. It may be undefined, if there are no English-language meanings listed.

# The following insane program converts English into kanji.

# Call it like "english-to-kanji.pl /where/is/kanjidic english-text".

use Data::Kanji::Kanjidic 'parse_kanjidic';
use Convert::Moji 'make_regex';
my $kanji = parse_kanjidic ($ARGV[0]);
my %english;
for my $k (keys %$kanji) {
    my $english = $kanji->{$k}->{english};
    if ($english) {
        for (@$english) {
            push @{$english{$_}}, $k;
        }
    }
}
my $re = make_regex (keys %english);
open my $in, "<", $ARGV[1] or die $!;
while (<$in>) {
    s/\b($re)\b/$english{$1}[int rand (@{$english{$1}})]/ge;
    print;
}

onyomi

This is an array reference which contains the on'yomi (音読) of the kanji. It may be undefined, if no on'yomi readings are listed. The on'yomi readings are in katakana, as per Kanjidic itself. It is encoded in Perl's internal Unicode encoding.

binmode STDOUT, ":utf8";
use Data::Kanji::Kanjidic 'parse_kanjidic';
use utf8;
my $kanji = parse_kanjidic ($ARGV[0]);
my %all_onyomi;
for my $k (keys %$kanji) {
    my $onyomi = $kanji->{$k}->{onyomi};
    if ($onyomi) {
        for my $o (@$onyomi) {
            push @{$all_onyomi{$o}}, $k;
        }
    }
}
for my $o (sort keys %all_onyomi) {
    if (@{$all_onyomi{$o}} > 1) {
        print "Same onyomi 「$o」 for 「@{$all_onyomi{$o}}」!\n";
    }
}

kunyomi

This is an array reference which contains the kun'yomi (訓読) of the kanji. It may be undefined, if no kun'yomi readings are listed. The kun'yomi readings are in hiragana, as per Kanjidic itself. It is encoded in Perl's internal Unicode encoding.

nanori

This is an array reference which contains nanori (名乗り) readings of the character. It may be undefined, if no nanori readings are listed. The nanori readings are in hiragana, as per Kanjidic itself. They are encoded in Perl's internal Unicode encoding.

morohashi

This is a hash reference containing data on the kanji's location in the Morohashi 'Dai Kan-Wa Jiten' kanji dictionary. The hash reference has the following keys.

volume: The volume number of the character.
page: The page number of the character.
index: The index number of the character.

If there is no information, this remains unset.

For example, to print all the existing values,

use Data::Kanji::Kanjidic 'parse_kanjidic';
use FindBin;
binmode STDOUT, ":utf8";
my $kanji = parse_kanjidic ("$FindBin::Bin/../t/kanjidic-sample");
for my $k (sort keys %$kanji) {
    my $morohashi = $kanji->{$k}->{morohashi};
    if ($morohashi) {
        print "$k: volume $morohashi->{volume}, page $morohashi->{page}, index $morohashi->{index}.\n";
    }
}

For detailed explanations of these codes, see "Kanjidic".

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.

To install Data::Kanji::Kanjidic, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Data::Kanji::Kanjidic

CPAN shell

perl -MCPAN -e shell
install Data::Kanji::Kanjidic

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

FUNCTIONS

parse_kanjidic

parse_entry

SEE ALSO

Other Perl modules

Kanjidic

AUTHOR

COPYRIGHT & LICENCE

NAME

SYNOPSIS

FUNCTIONS

parse_kanjidic

parse_entry

SEE ALSO

Other Perl modules

Kanjidic

AUTHOR

COPYRIGHT & LICENCE

Module Install Instructions