NAME
Word::Segmenter::Chinese::Lite - Split Chinese into words
SYNOPSIS
use Word::Segmenter::Chinese::Lite qw(wscl_seg wscl_set_mode);
my @result = wscl_seg("中华人民共和国成立了oyeah");
foreach (@result)
{
print $_, "\n";
}
# got:
# 中华人民共和国
# 成立
# 了
# oyeah
wscl_set_mode("obigram");
my @result = wscl_seg("中华人民共和国成立了");
foreach (@result)
{
print $_, "\n";
}
# got:
# 中华
# 华人
# 人民
# 民共
# 共和
# 和国
# 国成
# 成立
# 立了
# 了
wscl_set_mode("unigram");
my @result = wscl_seg("中华人民共和国");
foreach (@result)
{
print $_, "\n";
}
# got:
# 中
# 华
# 人
# 民
# 共
# 和
# 国
METHODS
wscl_set_mode($mode)
Optional.
You can choose modes below.
"dict" : Default. 词典分词,本模块自带词典。
"unigram" : 一元分词。
"obigram" : Overlapping Bigram. 交叉二元分词。
wscl_seg($chinese_article, $max_word_length)
Main method.
Input a chinese article which want to de splited.
Output a list.
$chinese_article -- must be utf8 encoding
$max_word_length -- Optional
EXPORT
no method will be exported by default.
2. Add overlapping-bigram,bigram,1gram algorithm.
AUTHOR
Chen Gang, <yikuyiku.com@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2014 by Chen Gang
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.16.2 or, at your option, any later version of Perl 5 you may have available.