The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Acme::GodoWord - Japanese tokenizer like master of babel

SYNOPSIS

    use Acme::GodoWord;
    
    my $message = 'あなたには見えない'; # you don't see
    
    # @tokens = qw( あなた には 見えない );
    my @tokens = Acme::GodoWord->tokenize( $message );
    
    # print q{【あなた」「には」「みえない】};
    print Acme::GodoWord->godowordize( $message );

DESCRIPTION

Acme::GodoWord is Japanese tokenizer like master of babel.

Master of babel is one kind of magic written in romance Kara no Kyoukai (空の境界) and magician KUROGIRI Satsuki (黒霧皐月, Godoword mayday) uses it.

Master of babel is expressed by such feeling in the novel:

    【あなた」「には」「みえない】
    
    【ここ」「では」「みえない】

This module is the divider into which Japanese is divided as mentioned above.

Kara no Kyoukai (空の境界) is the romance written by NASU Kinoko (奈須きのこ). It was filmed in recent years.

FUNCTION

tokenize

    my @tokens = Acme::GodoWord->tokenize( $utf8_message );

This method is tokenize Japanese message.

utf8 flag has to be on for the message to pass it to this argument.

babelize

    my $message = Acme::GodoWord->babelize( $utf8_message );

This method generates master of babel.

utf8 flag has to be on for the message to pass it to this argument.

AUTHOR

Naoki Okamrua (Nyarla) <nyarla[ at ]thotep.net>

SEE ALSO

Text::TinySegmenter

http://en.wikipedia.org/wiki/Nasu_Kinoko

http://en.wikipedia.org/wiki/Kara_no_Kyoukai

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.