NAME
UTF8::R2 - provides minimal CP932IBM I/O subroutines by short name
SYNOPSIS
use CP932IBM::R2;
@result = mbeach($utf8str)
$result = mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')
$result = iolen($utf8str)
$result = iomid($utf8expr, $offset_as_cp932x, $length_as_cp932x, $utf8replacement)
@result = ioget(FILEHANDLE)
$result = ioput(FILEHANDLE, @utf8str)
$result = ioputf(FILEHANDLE, $utf8format, @utf8list)
@result = iosort(@utf8str)
$result = $utf8str =~ $mb{qr/$utf8regex/imsxogc}
$result = $utf8str =~ s<$mb{qr/before/imsxo}><after>egr
MBCS SUBROUTINES for SCRIPTING
It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.
------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS Acts as MBCS
Octet in Script Octet in Script Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
// or m// or qr// $mb{qr/$utf8regex/imsxogc} not supports metasymbol \X that match grapheme
not support range of codepoint(like an "[A-Z]")
not supports POSIX character class (like an [:alpha:])
(such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon})
not supports character properties (like \p{PROP} and \P{PROP})
Special Escapes in Regex Support Perl Version
--------------------------------------------------------------------------------------------------
$mb{qr/ \x{Unicode} /} since perl 5.006
$mb{qr/ [^ ... ] /} since perl 5.008 ** CAUTION ** perl 5.006 cannot this
$mb{qr/ \h /} since perl 5.010
$mb{qr/ \v /} since perl 5.010
$mb{qr/ \H /} since perl 5.010
$mb{qr/ \V /} since perl 5.010
$mb{qr/ \R /} since perl 5.010
$mb{qr/ \N /} since perl 5.012
------------------------------------------------------------------------------------------------------------------------------------------
s/before/after/imsxoegr s<$mb{qr/before/imsxo}><after>egr
------------------------------------------------------------------------------------------------------------------------------------------
split(//,$_) mbeach($utf8str) split $utf8str as CP932IBM encoding into each characters
------------------------------------------------------------------------------------------------------------------------------------------
tr/// or y/// mbtr($utf8str, 'ABC', 'XYZ', 'cdsr') not support range of codepoint(like a "tr/A-Z/a-z/")
------------------------------------------------------------------------------------------------------------------------------------------
MBCS SUBROUTINES for I/O
If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932IBM octets semantics for you.
------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS Acts as MBCS
Octet in Script Octet of I/O Encoding Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
getc ioget(FILEHANDLE) get UTF-8 codepoint octets from CP932IBM file
------------------------------------------------------------------------------------------------------------------------------------------
length iolen($utf8str) octet count of UTF-8 string as CP932IBM encoding
------------------------------------------------------------------------------------------------------------------------------------------
print ioput(FILEHANDLE, @utf8str) print @utf8str as CP932IBM encoding
------------------------------------------------------------------------------------------------------------------------------------------
printf ioputf(FILEHANDLE, $utf8format, @utf8list) printf @utf8str as CP932IBM encoding
------------------------------------------------------------------------------------------------------------------------------------------
sort iosort(@utf8str) sort @utf8str as CP932IBM encoding
------------------------------------------------------------------------------------------------------------------------------------------
substr iomid($utf8expr, $offset_as_cp932x, $length_as_cp932x, $utf8replacement)
substr $utf8expr as CP932IBM octets
------------------------------------------------------------------------------------------------------------------------------------------
AUTHOR
INABA Hitoshi <ina@cpan.org>
This project was originated by INABA Hitoshi.
LICENSE AND COPYRIGHT
This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.