The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Lingua::JA::Fold - fold Japanese text, and more...

SYNOPSIS

 use utf8;
 use Lingua::JA::Fold;
 
 my $text = 'アイウエオ    漢字';
 my $obj = Lingua::JA::Fold->new($text);
 
 # replace a [TAB] with 4 of [SPACE]s.
 $obj->tab2space(4);
 # convert half-width 'Kana' characters to full-width ones.
 $obj->kana_half2full;
 
 # fold the text under 2 full-width characters par a line.
 $obj->fold(2);
 
 # output the result
 print $obj->output;

DESCRIPTION

This module is used for Japanese text wrapping and so on.

The Japanese (the Chinese and the Korean would be the same) text has traditionally unique manner in representing. Basically those characters are used to be represented in two kind of size which is 'full-width' or 'half-width'. The width and the height of full-width characters are the same size (regular square). At the point, it is different from the alphabet characters which have normally variable (slim) width in representing. Roughly say, we call the width of alphabet characters and Arabic numbers as a half, and do the width of other characters as a full. In a Japanese text which is mixed with alphabet and Arabic numbers, a character has a width, it would be full or half.

For such reasons, to wrapping Japanese text is rather complicate thing.

METHODS and FUNCTIONS

new($string)

The constructor class method.

output

This class method outputs the string (as Unicode Wide Character).

fold($i)

This object method folds up the string within the specified length of $i calculated as full-width characters.

fold_ex($i)

This object method folds up the string within the specified length of $i calculated as full-width characters. In addition to that, this method estimates the forbidden rule for the specific marks. It is said that this method is rather formal than the fold() as the Japanese text.

The forbidden rule is: 1) the termination marks like Ten "," and Maru ".", 2) closing marks -- brace or parenthesis or bracket -- like ")", "}", "]", ">" and etc., 3) repeat marks, those should not be at the top of a line. If it would be occured, these marks should be moved to the place at the end of the previous line.

Actually by this module what is detect as a forbidden mark are listed next:

 ’��。〃々〉》��】〕〟�ゞヽヾ),.]�

Note that these marks are all full-width Japanese characters.

fold_easy($i)

This object method folds the string just as is within the specified length of $i. The difference between full-width and half-width will be ignored. Easy to implementing :)

length_half($text)

This exportable function is for counting length of the $text as half-width characters.

length_full($text)

This exportable function is for counting length of the $text as full-width characters.

tab2space($i)

This object method replaces a [TAB] character with $i of [SPACE]s of the string.

kana_half2full

This object method converts from half-width 'Kana's to full-width ones of the string.

SEE ALSO

module: utf8
module: Encode

NOTES

This module runs under Unicode/UTF-8 environment (hence Perl5.8 or later is required), you should input octets with UTF-8 charset. Please use utf8; pragma to enable to detect strings as UTF-8 in your source code.

AUTHOR

Masanori HATA <lovewing@dream.big.or.jp> (Saitama, JAPAN)

COPYRIGHT

Copyright (c) 2003-2004 Masanori HATA. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 28:

Non-ASCII character seen before =encoding in ''アイウエオ'. Assuming CP1252