NAME

UTF8::R2 - provides minimal CP932NEC I/O subroutines by short name

SYNOPSIS

use CP932NEC::R2;

  @result = mbeach($utf8str)
  $result = mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')
  $result = iolen($utf8str)
  $result = iomid($utf8expr, $offset_as_cp932x, $length_as_cp932x, $utf8replacement)
  @result = ioget(FILEHANDLE)
  $result = ioput(FILEHANDLE, @utf8str)
  $result = ioputf(FILEHANDLE, $utf8format, @utf8list)
  @result = iosort(@utf8str)

  $result = $utf8str =~ $mb{qr/$utf8regex/imsxogc}
  $result = $utf8str =~ s<$mb{qr/before/imsxo}><after>egr

MBCS SUBROUTINES for SCRIPTING

It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.

------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS             Acts as MBCS
Octet in Script          Octet in Script                             Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
// or m// or qr//        $mb{qr/$utf8regex/imsxogc}                  not supports metasymbol \X that match grapheme
                                                                     not support range of codepoint(like an "[A-Z]")
                                                                     not supports POSIX character class (like an [:alpha:])
                                                                     (such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon})
                                                                     not supports character properties (like \p{PROP} and \P{PROP})

                         Special Escapes in Regex                    Support Perl Version
                         --------------------------------------------------------------------------------------------------
                         $mb{qr/ \x{Unicode} /}                      since perl 5.006
                         $mb{qr/ [^ ... ] /}                         since perl 5.008  ** CAUTION ** perl 5.006 cannot this
                         $mb{qr/ \h /}                               since perl 5.010
                         $mb{qr/ \v /}                               since perl 5.010
                         $mb{qr/ \H /}                               since perl 5.010
                         $mb{qr/ \V /}                               since perl 5.010
                         $mb{qr/ \R /}                               since perl 5.010
                         $mb{qr/ \N /}                               since perl 5.012

------------------------------------------------------------------------------------------------------------------------------------------
s/before/after/imsxoegr  s<$mb{qr/before/imsxo}><after>egr
------------------------------------------------------------------------------------------------------------------------------------------
split(//,$_)             mbeach($utf8str)                            split $utf8str as CP932NEC encoding into each characters
------------------------------------------------------------------------------------------------------------------------------------------
tr/// or y///            mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')        not support range of codepoint(like a "tr/A-Z/a-z/")
------------------------------------------------------------------------------------------------------------------------------------------

MBCS SUBROUTINES for I/O

If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932NEC octets semantics for you.

------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS             Acts as MBCS
Octet in Script          Octet of I/O Encoding                       Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
getc                     ioget(FILEHANDLE)                           get UTF-8 codepoint octets from CP932NEC file
------------------------------------------------------------------------------------------------------------------------------------------
length                   iolen($utf8str)                             octet count of UTF-8 string as CP932NEC encoding
------------------------------------------------------------------------------------------------------------------------------------------
print                    ioput(FILEHANDLE, @utf8str)                 print @utf8str as CP932NEC encoding
------------------------------------------------------------------------------------------------------------------------------------------
printf                   ioputf(FILEHANDLE, $utf8format, @utf8list)  printf @utf8str as CP932NEC encoding
------------------------------------------------------------------------------------------------------------------------------------------
sort                     iosort(@utf8str)                            sort @utf8str as CP932NEC encoding
------------------------------------------------------------------------------------------------------------------------------------------
substr                   iomid($utf8expr, $offset_as_cp932x, $length_as_cp932x, $utf8replacement)
                                                                     substr $utf8expr as CP932NEC octets
------------------------------------------------------------------------------------------------------------------------------------------

AUTHOR

INABA Hitoshi <ina@cpan.org>

This project was originated by INABA Hitoshi.

LICENSE AND COPYRIGHT

This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.