NAME

Charset - write perl codes in any encodings you like

SYNOPSIS

use Charset "euc-jp"; # Jperl!
#...
sub tricky_part{
   no Charset;
   #...
}
use Charset "euc-jp"; # restore the state; Filter::Simple bug.
# Handy for EUC-JP => UTF-8 converter 
# when your text editor only supports Shift_JIS !
use Charset "shiftjis", IN => "euc-jp", OUT => "utf8";
# If your shell supports EUC-JP, you can even do this!
perl -MCharset=euc-jp 'print "Nihongo\n" x 4'

ABSTRACT

This module allows you to write your perl codes in not only ASCII (or EBCDIC where your environment allows) or UTF-8 but any character encodings that Encode module supports.

USAGE

First argument to the use line must be the name of encoding which matches your script. It croaks if none specified or the one specified is unsupported by the Encode module.

You can optionally feed the argument in hash. The followin options are supported.

STDIN => enc_name: Sets the discipline of STDIN to :encoding(enc_name). By default, the same encoding as the caller script is used.
STDOUT => enc_name: Sets the discipline of STDOUT to :encoding(enc_name). By default, the same encoding as the caller script is used.
IN => enc_name: Internally does use open IN => ":encoding(enc_name)". No default is set. See open.
OUT => enc_name: Internally does use open OUT => ":encoding(enc_name)". No default is set. See open.
IO => enc_name: Internally does use open IO => ":encoding(enc_name)". No default is set. IN or OUT overrides this setting.

DESCRIPTION

This is a technology demonstrator of Perl 5.8.0. It uses Encode and Filter::Util::Call, both of which will be inlucuded in perl distribution.

Before perl 5.6.0, a character means a byte. Though it was possible to include literals in multibyte characters in certain encodings (such as EUC-JP), You needed to handle them with care. Some encodings didn't even allow this (such as Shift_JIS) and you needed things like Jperl to do that. If your multibyte encoding was not Japanese, you were out of luck.

As of Perl 5.6.0, you could use UTF-8 strings internally so you could apply everything you wanted to do to multilingual string, including regexes. You could even use UTF-8 string for identifiers you could go like

my $Ren++; #   "Ren" is really a U+4EBA

to make a child :) But there was one precondition. Your source file must be in UTF-8. With decent text editors and environments that can handle UTF-8 was rare (and still is to some extent), You still needed character encoding converters like Jcode.pm

With perl 5.8.0 and this module, this will all change. Your old script in your regional character encoding suddenly starts working just by adding

use Charset qw(your-encoding);

BUGS

This modules uses Filter::Simple. So it is subject to the limitation of Filter::Simple. Filter::Simple and Text::Balance which Filter::Simple uses does a pretty good job for block detection

AUTHOR

Dan Kogai <dankogai@dan.co.jp>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Charset, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Charset

CPAN shell

perl -MCPAN -e shell
install Charset

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)