NAME

Lingua::ZH::HanConvert - convert between Traditional and Simplified Chinese characters

SYNOPSIS

    #!perl -lw
    use Lingua::ZH::HanConvert qw(simple trad);
    use utf8;
    
    my $t = "國"; # Traditional symbol for "country", unicode 22283
	# or: my $t = v22283;

    print simple($t); # Simplified "country", 国 (unicode 22269)
    
    $s = "é±¼"; # Simplified symbol for "fish", unicode 40060
	# or: $s = v40060;

    print trad($s); # Traditional "fish", é­š (unicode 39970)

REQUIRES

Perl 5.6

DESCRIPTION

In the 1950's, the Chinese government simplified over 2000 Chinese characters, to help promote literacy. Taiwan and Hong Kong still use the traditional characters. The simplified characters are hard to read if you only know the traditional ones, and vice-versa.

This module attempts to convert Chinese text between the two forms, using character-by-character transliteration.

Note that this module only handles text in the Unicode UTF-8 character set. If you need to convert between the Big5 and GB character sets, then please look at Text::IConv.

simple takes a string, converts any traditional Chinese characters (such as 國, unicode U+570B, meaning "country") to the corresponding simplified characters (like 国, unicode U+56FD, also meaning "country"), and returns the result. Characters which are not traditional Chinese do not change.

trad does the reverse; it converts any simplified Chinese characters to the corresponding traditional characters. Characters which are not simplified Chinese do not change.

BUGS, LIMITATIONS

Transliteration is not perfect. At the moment, this module only performs character-by-character transliteration, using the (one-to-one) mappings from the Unicode consortium's Unihan database. Converted text is very imperfect, though it is generally good enough to be readable.

The transliteration mappings could be improved; if anyone knows of another source of mappings then please let me know. Ideally, I'd like to see the module performing word-by-word transliteration, if suitable data sources were available. See http://www.basistech.com/articles/C2C.html for a discussion of transliteration issues.

The module may take several seconds to initialise. Each subroutine is slow the first time it is run, but is faster when run subsequent times.

The characters in this documentation may not display correctly unless the program you are reading it with is unicode-aware.

ACKNOWLEDGEMENTS

The data used by this module is taken from the Unicode consortium's Unihan database, available from ftp://ftp.unicode.org. Thanks to them for compiling the data.

AUTHOR

David Chan <david@sheetmusic.org.uk>

COPYRIGHT

Copyright (C) 2001, David Chan. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 231:

Non-ASCII character seen before =encoding in '"國";'. Assuming CP1252