NAME

Lingua::ZH::TaBE - Chinese processing via libtabe

VERSION

This document describes version 0.07 of Lingua::ZH::TaBE, released December 31, 2005.

SYNOPSIS

    use Lingua::ZH::TaBE;

    my $tabe = Lingua::ZH::TaBE->new;

    # Phrase splitter
    my @phrases = $tabe->split(
	"當我們在電腦中處理中文資訊時,相信其中最惱人的".
	"狀況之一,莫過於想打的字打不出來了。"
    );

    # Chaining various components
    print $tabe->Chu("道可道,非常道。")    # sentence
	->chunks->[2]	    # 非常道	    # chunk
	->tsis->[0]	    # 非常	    # phrase
	->zhis->[1]	    # 常	    # character
	->yins->[0]	    # ㄔㄤˊ	    # pronounciation
	->zuyins->[0],	    # ㄔ	    # phonetic symbols

DESCRIPTION

This module is a Perl interface to the TaBE (Taiwan and Big5 Encoding) library, an unified interface and library dealing with Chinese words, phrases, sentences, and phonetic symbols; it is intended to be used as the foundation of Chinese text processing.

Lingua::ZH::TaBE provides an object-oriented interface (preferred), as well as a procedural interface consisting of all C functions in tabe.h.

Object-Oriented Interface

Lingua::ZH::TaBE

new( [tsi_db => $file, tsiyin_db => $file] )

Creates a LibTaBE handle and opens databases. If unspecified, find in the usual libtabe data directory automatically.

split( $string [, $method] )

Split the text in $string; returns a list of strings representing the words obtained. You may specify Complex or Backward as $method to use an alternate segmentation algorithm.

Chu(), Chunk(), Tsi(), Zhi(), Yin(), ZuYin()

Constructors for various level of objects, each taking one argument for initialization.

Lingua::ZH::TaBE::Chu

chunks()

Lingua::ZH::TaBE::Chunk

tsis([$method])

Lingua::ZH::TaBE::Tsi

zhis()
yins()

Lingua::ZH::TaBE::Zhi

yins()
ToZhi()
ToZhiCode()
IsBig5Code()
ToPackedBig5Code()
LookupRefCount()

Lingua::ZH::TaBE::Yin

zuyins()
zhis()
ToYin()
ToZuYinSymbolSequence()

Lingua::ZH::TaBE::ZuYin

yin()
zhi()

Procedural Interface

All functions below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

$TsiDB	= TsiDBOpen($type, $db_name, $flags);
$num	= TsiInfoLookupPossibleTsiYin($TsiDB, $Tsi);
$TsiYinDB	= TsiYinDBOpen($type, $db_name, $flags);
$num	= ChuInfoToChunkInfo($Chu);
$num	= ChunkSegmentationSimplex($TsiDB, $Chunk);
$num	= ChunkSegmentationComplex($TsiDB, $Chunk);
$num	= ChunkSegmentationBackward($TsiDB, $Chunk);
$num	= TsiInfoLookupZhiYin($TsiDB, $Tsi);
$string     = YinLookupZhiList($Yin);
$string     = YinToZuYinSymbolSequence($Yin);
$yin	= ZuYinSymbolSequenceToYin($string);
$zhi	= ZuYinIndexToZuYinSymbol($ZuYin);
$zuyin	= ZuYinSymbolToZuYinIndex($Zhi);
$zuyin	= ZozyKeyToZuYinIndex($key);
$num	= ZhiIsBig5Code($Zhi);
$zhicode	= ZhiToZhiCode($Zhi);
$zhi        = ZhiCodeToZhi($zhicode);
$num	= ZhiCodeToPackedBig5Code($zhicode);
$num	= ZhiCodeLookupRefCount($zhicode);

Constants

All constants below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

DB_TYPE_DB			0
DB_TYPE_LAST		1
DB_FLAG_OVERWRITE		0x01
DB_FLAG_CREATEDB		0x02
DB_FLAG_READONLY		0x04
DB_FLAG_NOSYNC		0x08
DB_FLAG_SHARED		0x10
DB_FLAG_NOUNPACK_YIN	0x20

CAVEATS

The TsiYin family of functions are yet incomplete.

SEE ALSO

ftp://xcin.linux.org.tw/pub/xcin/libtabe/devel/

http://libtabe.sourceforge.net/

AUTHORS

Audrey Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2003, 2004, 2005 by Audrey Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html