NAME
jacode - Perl program for Japanese character code conversion
SYNOPSIS
works like 'jcode.pl' in your script
require 'jacode.pl';
($subref, $got_INPUT_encoding) = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
$got_INPUT_encoding = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
($esc_DBCS, $esc_ASCII) = jacode::get_inout($line)
($esc_DBCS_fully, $esc_ASCII_fully) = jacode::jis_inout([$esc_DBCS [, $esc_ASCII]])
($matched_length, $encoding) = jacode::getcode(\$line)
$encoding = jacode::getcode(\$line)
jacode::init()
works as 'pkf' command on command line (shows help)
$ perl jacode.pl
INSTALL of "jacode.pl"
- 1. Open URL of "jacode.pl"
- 2. Click This
-
---------------------------------- Source (raw) <--- Click this (raw) Browse (raw) Changes How to Contribute Repository Issues Testers (NNN / NNN / NNN) Kwalitee Bus factor: NN NN.NN% Coverage License: perl_5 Perl: v5.5.30 ----------------------------------
- 3. Select All Text of Page
- 4. Save Text as "jacode.pl"
DESCRIPTION
This software can convert each other "JIS", "SJIS", "EUC-JP", and "UTF-8" that are frequently used as encoding for Japanese string.
Interface of "jacode.pl" is same of "jcode.pl" that we know well.
On the other hand its ability is same of "Encode" module that can everything to convert character encoding.
Moreover "jacode.pl" works like pkf command on command line. You can get help message with following command:
$ perl jacode.pl
The code conversion from "sjis" to "utf8" is done by using following table.
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
From "utf8" to "sjis" is done by using the "CP932.TXT" and following table.
PRB: Conversion Problem Between Shift-JIS and Unicode
http://support.microsoft.com/kb/170559/en-us
What is good of this software
jcode.pl upper compatible
pkf command upper compatible
is Perl4 script and also Perl5 script
supports HALFWIDTH KATAKANA
supports UTF-8 by cp932 to Unicode table
powered by Encode::from_to (not only Japanese!)
So we believe this software will be useful for DX (Digital Transformation) and IT modernization of Japanese information processing.
DEPENDENCIES
Running this software requires perl 4.036 or later.
DEFINITIONS
To easy documentation, this document uses following words for encoding name.
"euc"
means "EUC-JP"
"jis"
means "ISO-2022-JP-1"
https://ja.wikipedia.org/wiki/ISO-2022-JP
"sjis"
means "Microsoft CP932", Actually NOT "Shift_JIS".
https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)
https://ja.wikipedia.org/wiki/Microsoft%E3%82%B3%E3%83%BC%E3%83%89%E3%83%9A%E3%83%BC%E3%82%B8932
"utf8"
means RFC 3629's "UTF-8"
https://en.wikipedia.org/wiki/UTF-8
https://www.ietf.org/rfc/rfc2279.txt
https://www.akanko.net/marimo/data/rfc/rfc2279-jp.txt
https://www.ietf.org/rfc/rfc3629.txt
SUBROUTINES
($subref, $got_INPUT_encoding) = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
$got_INPUT_encoding = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
Converts "$line" from "$INPUT_encoding" to "$OUTPUT_encoding" then overwrites "$line".
"$OUTPUT_encoding" can be any of "jis", "sjis", "euc", or "utf8", or when you want to no convert, can "noconv".
"$INPUT_encoding" can be any of "jis", "sjis", "euc", or "utf8".
If "$INPUT_encoding" is omitted, "jacode::getcode(\$line)" is called internally and its return value is used as "$INPUT_encoding". However, today, we have utf8, sjis, and euc encodings, thus "jacode::getcode()" cannot guess enough unlike old days. So we must never omit it.
"$option" can be omit or "h" or "z".
"h" means converting ZENKAKU-KATAKANA to HANKAKU-KATAKANA
"z" means converting HANKAKU-KATAKANA to ZENKAKU-KATAKANA
On perl 5.8.1 or later, if "$OUTPUT_encoding" or "$INPUT_encoding" is neither "jis", "sjis", "euc" nor "utf8" then "jacode::convert()" works as "Encode::from_to()".
jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding, $option)
works as
Encode::from_to( $line, $INPUT_encoding, $OUTPUT_encoding )
Returns ("$subref", "$got_INPUT_encoding") when this subroutine was called in list context.
Returns only "$got_INPUT_encoding" when this subroutine was called in scalar context.
"$subref" is reference of subroutine that does convert.
"$got_INPUT_encoding" is "$INPUT_encoding" or return value of "jacode::getcode(\$line)".
($esc_DBCS, $esc_ASCII) = jacode::get_inout($line)
Looks for DBCS start sequences and ASCII start sequences from "$line".
"$esc_DBCS" is escape sequence at start of DBCS or undef if not found.
"$esc_ASCII" is escape sequence at start of ASCII or undef if not found.
($esc_DBCS_fully, $esc_ASCII_fully) = jacode::jis_inout([$esc_DBCS [, $esc_ASCII]])
Sets DBCS start sequences and ASCII start sequences.
"$esc_ASCII" will be "ESC-(-B" if "$esc_ASCII" is omitted.
"$esc_DBCS" will be "ESC-$-B" if "$esc_DBCS" is omitted.
Returns "$esc_DBCS_fully" that is fully escape sequence of DBCS.
Returns "$esc_ASCII_fully" that is fully escape sequence of ASCII.
You can use also short-sequence for "$esc_DBCS".
--------------------------------------------------------
short-sequence full-sequence means
--------------------------------------------------------
@ ESC-$-@ JIS C 6226-1978
B ESC-$-B JIS X 0208-1983
& ESC-&@-ESC-$-B JIS X 0208-1990
O ESC-$-(-O JIS X 0213:2000 plane1
Q ESC-$-(-Q JIS X 0213:2004 plane1
--------------------------------------------------------
jacode::init()
Initialize the variables used in this package.
Call "jacode::init()" first if you embedded the "jacode.pl" at the end of your script. You don't have to call this when using "jocde.pl" by "do" or "require" interface.
Other SUBROUTINES and VARIABLES
jacode::xxx2yyy(\$line [, $option])
Converts encoding of "$line" from "xxx" to "yyy" then overwrites "$line".
"xxx" and "yyy" can be "jis", "euc", "sjis" or "utf8".
jacode::euc2euc(\$line [, $option])
jacode::euc2jis(\$line [, $option])
jacode::euc2sjis(\$line [, $option])
jacode::euc2utf8(\$line [, $option])
jacode::jis2euc(\$line [, $option])
jacode::jis2jis(\$line [, $option])
jacode::jis2sjis(\$line [, $option])
jacode::jis2utf8(\$line [, $option])
jacode::sjis2euc(\$line [, $option])
jacode::sjis2jis(\$line [, $option])
jacode::sjis2sjis(\$line [, $option])
jacode::sjis2utf8(\$line [, $option])
jacode::utf82euc(\$line [, $option])
jacode::utf82jis(\$line [, $option])
jacode::utf82sjis(\$line [, $option])
jacode::utf82utf8(\$line [, $option])
"$option" can be omit or "h" or "z".
"h" means converting ZENKAKU-KATAKANA to HANKAKU-KATAKANA
"z" means converting HANKAKU-KATAKANA to ZENKAKU-KATAKANA
Returns coount of converted characters.
$line_by_OUTPUT_encoding = jacode::to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])
This subroutine works as following.
local ( $OUTPUT_encoding, $s, $INPUT_encoding, $option ) = @_;
&convert( *s, $OUTPUT_encoding, $INPUT_encoding, $option );
$s;
This subroutine is easy to use in "s///e" operator or other place since return by value.
We should no longer omit "$INPUT_encoding".
$option can be omit or "h" or "z".
$line_by_jis = jacode::jis($line, $INPUT_encoding [, $option])
This subroutine works as "jacode::to('jis', @_)".
$line_by_euc = jacode::euc($line, $INPUT_encoding [, $option])
This subroutine works as "jacode::to('euc', @_)".
$line_by_sjis = jacode::sjis($line, $INPUT_encoding [, $option])
This subroutine works as "jacode::to('sjis', @_)".
$line_by_utf8 = jacode::utf8($line, $INPUT_encoding [, $option])
This subroutine works as "jacode::to('utf8', @_)".
$transliterated_char_count = jacode::h2z_euc(\$line)
Converts HANKAKU-KATAKANA string by euc to ZENKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::h2z_jis(\$line)
Converts HANKAKU-KATAKANA string by jis to ZENKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::h2z_sjis(\$line)
Converts HANKAKU-KATAKANA string by sjis to ZENKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::h2z_utf8(\$line)
Converts HANKAKU-KATAKANA string by utf8 to ZENKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::z2h_euc(\$line)
Converts ZENKAKU-KATAKANA by euc to HANKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::z2h_jis(\$line)
Converts ZENKAKU-KATAKANA by jis to HANKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::z2h_sjis(\$line)
Converts ZENKAKU-KATAKANA by sjis to HANKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
$transliterated_char_count = jacode::z2h_utf8(\$line)
Converts ZENKAKU-KATAKANA by utf8 to HANKAKU-KATAKANA string then overwrites "$line".
Returns "$transliterated_char_count" that means count of transliterated characters.
($matched_length, $encoding) = jacode::getcode(\$line)
$encoding = jacode::getcode(\$line)
Use of this subroutine is deprecated.
Returns a guessed encoding of string in "$line".
Returns ($matched_length, $encoding) when list context.
Returns only "$encoding" when scalar context.
"$encoding" is any of
jis
can guess "$line" is jis encoded.
sjis
can guess "$line" is sjis encoded.
euc
can guess "$line" is euc encoded.
utf8
can guess "$line" is utf8 encoded.
binary
"$line" has binary data.
undef
cannot guess encoding of "$line".
"$matched_length" means parsable length in octet unit if $encoding is "sjis" or "euc" or "utf8". Maybe this value never be useful.
Do not expect that this subroutine is perfect. Because it is impossible to fully recognize and determine "sjis", "euc" and "utf8".
($matched_length, $encoding) = jacode::getcode_utashiro_2000_09_29(\$line)
$encoding = jacode::getcode_utashiro_2000_09_29(\$line)
Keeps original implementation "&jcode'getcode()" of "jcode.pl" by Kazumasa Utashiro-san.
Not supports utf8.
$previous_caching_state = jacode::cache()
$previous_caching_state = jacode::nocache()
jacode::flushcache()
jacode::flush()
In default, converted character is cached in memory to avoid same calculations have to be done many times.
To disable this caching feature, call "jacode::nocache()". You can be enable again by calling "jacode::cache()". And you can clear cache memory by calling "jacode::flushcache()".
"jacode::cache()" and "jacode::nocache()" subroutines return previous caching state.
1 means that previous caching state was enable.
0 means that previous caching state was disable.
"jacode::flush()" works as "jacode::flushcache()" for cover old document's bug.
$transliterated_char_count = jacode::tr(\$line, $from, $to [, $option])
Use of this subroutine is deprecated.
We recommend using mb::tr() of mb.pm modulino or UTF8::R2::tr() of UTF8::R2 module.
Like "tr///" operator, this subroutine transliterates "$line" from "$from" to "$to" then overwrites "$line".
"$option" can only "d" that means "tr///d".
Returns coount of transliterated characters.
This subroutine can work only jis or euc. This subroutine can work only JIS X 0208, cannot JIS X 0212.
"$from" and "$to" can use range like an "A-Z". Start character and end character by "-(hyphen)" in DBCS must have same value at first byte in them. If you want to use "-(hyphen itself)" in "$from" or "$to", "-" must put last position of them.
$line = 'ABCDEF';
jacode::tr(\$line, 'ABC-E', '12345'); # works as $line =~ tr/ABC-E/12345/
# got '12345F' in $line
$line = 'ABCDEF';
jacode::tr(\$line, 'ABC-E', '12345', 'd'); # with /d modifier
# got '12345' in $line
$line = 'A-B-C';
jacode::tr(\$line, 'ABC-', '123*'); # "-" must be last
# got '1*2*3' in $line
$transliterated_line = jacode::trans($line, $from, $to [, $option])
Use of this subroutine is deprecated.
We recommend using mb::tr() of mb.pm modulino or UTF8::R2::tr() of UTF8::R2 module.
Translates "$line" from "$from" to "$to" then returns translated "$line" by value.
"jacode::trans()" calls "jacode::tr()" internally.
See also "jacode::tr()".
$jacode::convf{ join($;, 'jis', 'sjis') }
Provides reference of subroutine "jacode::jis2sjis()" that converts string jis encoding to sjis encoding.
$jacode::convf{ join($;, 'jis', 'euc') }
Provides reference of subroutine "jacode::jis2euc()" that converts string jis encoding to euc encoding.
$jacode::convf{ join($;, 'jis', 'utf8') }
Provides reference of subroutine "jacode::jis2utf8()" that converts string jis encoding to utf8 encoding.
$jacode::convf{ join($;, 'euc', 'jis') }
Provides reference of subroutine "jacode::euc2jis()" that converts string euc encoding to jis encoding.
$jacode::convf{ join($;, 'euc', 'sjis') }
Provides reference of subroutine "jacode::euc2sjis()" that converts string euc encoding to sjis encoding.
$jacode::convf{ join($;, 'euc', 'utf8') }
Provides reference of subroutine "jacode::euc2utf8()" that converts string euc encoding to utf8 encoding.
$jacode::convf{ join($;, 'sjis', 'jis') }
Provides reference of subroutine "jacode::sjis2jis()" that converts string sjis encoding to jis encoding.
$jacode::convf{ join($;, 'sjis', 'euc') }
Provides reference of subroutine "jacode::sjis2euc()" that converts string sjis encoding to euc encoding.
$jacode::convf{ join($;, 'sjis', 'utf8') }
Provides reference of subroutine "jacode::sjis2utf8()" that converts string sjis encoding to utf8 encoding.
$jacode::convf{ join($;, 'utf8', 'jis') }
Provides reference of subroutine "jacode::utf82jis()" that converts string utf8 encoding to jis encoding.
$jacode::convf{ join($;, 'utf8', 'sjis') }
Provides reference of subroutine "jacode::utf82sjis()" that converts string utf8 encoding to sjis encoding.
$jacode::convf{ join($;, 'utf8', 'euc') }
Provides reference of subroutine "jacode::utf82euc()" that converts string utf8 encoding to euc encoding.
$jacode::h2zf{'euc'}
Provides reference of subroutine "jacode::h2z_euc()" that converts HANKAKU-KATAKANA string by euc to ZENKAKU-KATAKANA string.
$jacode::h2zf{'jis'}
Provides reference of subroutine "jacode::h2z_jis()" that converts HANKAKU-KATAKANA string by jis to ZENKAKU-KATAKANA string.
$jacode::h2zf{'sjis'}
Provides reference of subroutine "jacode::h2z_sjis()" that converts HANKAKU-KATAKANA string by sjis to ZENKAKU-KATAKANA string.
$jacode::h2zf{'utf8'}
Provides reference of subroutine "jacode::h2z_utf8()" that converts HANKAKU-KATAKANA string by utf8 to ZENKAKU-KATAKANA string.
$jacode::z2hf{'euc'}
Provides reference of subroutine "jacode::z2h_euc()" that converts ZENKAKU-KATAKANA string by euc to HANKAKU-KATAKANA string.
$jacode::z2hf{'jis'}
Provides reference of subroutine "jacode::z2h_jis()" that converts ZENKAKU-KATAKANA string by jis to HANKAKU-KATAKANA string.
$jacode::z2hf{'sjis'}
Provides reference of subroutine "jacode::z2h_sjis()" that converts ZENKAKU-KATAKANA string by sjis to HANKAKU-KATAKANA string.
$jacode::z2hf{'utf8'}
Provides reference of subroutine "jacode::z2h_utf8()" that converts ZENKAKU-KATAKANA string by utf8 to HANKAKU-KATAKANA string.
Old Interface on Perl4
On Perl4, you need "&" to call subroutines.
This software provides also package "jcode" for "jcode.pl" users. An '(single quote) is required after package name "jcode", like "&jcode'convert".
On Perl4 uses globs like this "*line", not references.
&jcode'convert(*line, $OUTPUT_encoding, $INPUT_encoding [, $option])
&jcode'getcode_utashiro_2000_09_29(*line)
&jcode'getcode(*line)
&jcode'get_inout($line)
&jcode'jis_inout([$esc_DBCS [, $esc_ASCII]])
&jcode'init()
&jcode'xxx2yyy(*line [, $option])
$jcode'convf{join($;, 'xxx', 'yyy')}
&jcode'to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])
&jcode'jis($line, $INPUT_encoding [, $option])
&jcode'euc($line, $INPUT_encoding [, $option])
&jcode'sjis($line, $INPUT_encoding [, $option])
&jcode'utf8($line, $INPUT_encoding [, $option])
&jcode'cache()
&jcode'nocache()
&jcode'flushcache()
&jcode'flush()
&jcode'h2z_xxx(*line)
&jcode'z2h_xxx(*line)
$jcode'z2hf{'xxx'}
$jcode'h2zf{'xxx'}
&jcode'tr(*line, $from, $to [, $option])
&jcode'trans($line, $from, $to [, $option])
Old Interface on Perl5
On Perl5, "&" is not required to call subroutines.
This software provides also package "jcode" for "jcode.pl" users. "::(double colons)" are required after package name "jcode", like "jcode::convert".
We have to use reference for parameter "$line". Because lexical variable is not a subject of typeglob, "*string" style call doesn't work if the variable is declared as "my". Same thing happens to special variable "$_" if the perl is compiled to use thread capability.
jcode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
jcode::getcode_utashiro_2000_09_29(\$line)
jcode::getcode(\$line)
jcode::get_inout($line)
jcode::jis_inout([$esc_DBCS [, $esc_ASCII]])
jcode::init()
jcode::xxx2yyy(\$line [, $option])
&{$jcode::convf{join($;, 'xxx', 'yyy')}}(\$line)
jcode::to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])
jcode::jis($line, $INPUT_encoding [, $option])
jcode::euc($line, $INPUT_encoding [, $option])
jcode::sjis($line, $INPUT_encoding [, $option])
jcode::utf8($line, $INPUT_encoding [, $option])
jcode::cache()
jcode::nocache()
jcode::flushcache()
jcode::flush()
jcode::h2z_xxx(\$line)
jcode::z2h_xxx(\$line)
&{$jcode::z2hf{'xxx'}}(\$line)
&{$jcode::h2zf{'xxx'}}(\$line)
jcode::tr(\$line, $from, $to [, $option])
jcode::trans($line, $from, $to [, $option])
SAMPLES
Convert SJIS to JIS and print each line with encoding name at head
#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
$INPUT_encoding = &jcode'convert(*s, 'jis', 'sjis');
print $INPUT_encoding, "\t", $s;
}
The safest way of JIS conversion
#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
($matched, $INPUT_encoding) = &jcode'getcode(*s);
if ((@buf == 0) && ($matched == 0)) {
print $s;
next;
}
push(@buf, $s);
next unless $INPUT_encoding;
while (defined($s = shift(@buf))) {
&jcode'convert(*s, 'jis', $INPUT_encoding);
print $s;
}
while (defined($s = <>)) {
&jcode'convert(*s, 'jis', $INPUT_encoding);
print $s;
}
last;
}
print @buf if @buf;
Convert SJIS to UTF-8 and print each line by perl 4.036 or later
#retire 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
&jacode'convert(*s, 'utf8', 'sjis');
print $s;
}
Convert SJIS to UTF-8.1 and print each line by perl 4.036 or later
require 'jacode.pl';
while (defined($s = <>)) {
# STEP 1of2 converts SJIS to UTF-8.0
&jacode'convert(*s, 'utf8', 'sjis');
# STEP 2of2 converts UTF-8.0 to UTF-8.1 see also https://metacpan.org/pod/Jacode4e#UTF-8.0-vs.-UTF-8.1
$s =~ s#\xe2\x80\x94#\xe2\x80\x95#g;
$s =~ s#\xe2\x80\x96#\xe2\x88\xa5#g;
$s =~ s#\xe2\x88\x92#\xef\xbc\x8d#g;
# got UTF-8.1
print $s;
}
Convert SJIS to UTF16-BE and print each line by perl 5.8.1 or later
require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
jacode::convert(\$s, 'UTF16-BE', 'sjis');
print $s;
}
Convert SJIS to MIME-Header-ISO_2022_JP and print each line by perl 5.8.1 or later
require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
jacode::convert(\$s, 'MIME-Header-ISO_2022_JP', 'sjis');
print $s;
}
BUGS AND LIMITATIONS
You must use -Llatin switch if you use on the JPerl4.
You must use -b switch if you use on the JPerl5.
We have tested and verified this software using the best of our ability. However, a software containing much code is bound to contain some bugs. Thus, if you happen to find a bug that's in jacode.pl and not your own program, you can try to reduce it to a minimal test case and then report it to author's address. If you have an idea that could make this a more useful tool, please let everyone share it.
SOFTWARE LIFE CYCLE
Jacode.pm
jcode.pl Encode.pm jacode.pl Jacode4e Jacode4e::RoundTrip
--------------------------------------------------------------------------------
1993 Perl4.036 | |
: : : :
1999 Perl5.00503 | | | |
2000 Perl5.6 | | | |
2002 Perl5.8 | Born | | |
2007 Perl5.10 V | | | |
2010 Perl5.12 EOL | Born | |
2011 Perl5.14 | | | |
2012 Perl5.16 | | | |
2013 Perl5.18 | | | |
2014 Perl5.20 | | | |
2015 Perl5.22 | | | |
2016 Perl5.24 | | | |
2017 Perl5.26 | | | |
2018 Perl5.28 | | Born Born
2019 Perl5.30 | | | |
2020 Perl5.32 : : : :
2030 Perl5.52 : : : :
2040 Perl5.72 : : : :
2050 Perl5.92 : : : :
2060 Perl5.112 : : : :
2070 Perl5.132 : : : :
2080 Perl5.152 : : : :
2090 Perl5.172 : : : :
2100 Perl5.192 : : : :
2110 Perl5.212 : : : :
2120 Perl5.232 : : : :
: : V V V V
--------------------------------------------------------------------------------
Removed jcode.pl's Bug
jacode.pl removed following 2 bugs that jcode.pl had.
Bad $n count in jcode'_jis2sjis()
Implementation of "jcode.pl 2.13"
sub _jis2sjis {
local($esc, $s) = @_;
if ($esc =~ /^$re_jis0212/o) {
$s =~ s/../$undef_sjis/g;
$n = length; # *** here ***
}
elsif ($esc !~ /^$re_asc/o) {
$n += $s =~ tr/\041-\176/\241-\376/;
if ($esc =~ /^$re_jp/o) {
$s =~ s/($re_euc_c)/$e2s{$1}||&e2s($1)/geo;
}
}
$s;
}
Implementation of "jacode.pl"
sub _jis2sjis {
local ( $esc, $s ) = @_;
if ( $esc =~ /^$re_esc_asc/o ) {
}
elsif ( $esc =~ /^$re_esc_kana/o ) {
$s =~ tr/\x21-\x7e/\xa1-\xfe/;
$n += length($s);
}
elsif ( $esc =~ /^$re_esc_jis0212/o ) {
$s =~ s/[\x00-\xff][\x00-\xff]/$n++, $undef_sjis/ge; # *** here ***
}
else {
$s =~ tr/\x21-\x7e/\xa1-\xfe/;
$s =~ s/($re_euc_c)/$n++, ($e2s{$1}||&e2s($1))/geo;
}
$s;
}
jcode'tr() was ignoring options
Implementation of "jcode.pl 2.13"
sub tr {
# $prev_from, $prev_to, %table are persistent variables
local(*s, $from, $to, $opt) = @_;
local(@from, @to);
local($jis, $n) = (0, 0);
$jis++, &jis2euc(*s) if $s =~ /$re_jp|$re_asc|$re_kana/o;
$jis++ if $to =~ /$re_jp|$re_asc|$re_kana/o;
if (!defined($prev_from) || $from ne $prev_from || $to ne $prev_to) { # *** here (1of2) ***
($prev_from, $prev_to) = ($from, $to); # *** here (2of2) ***
undef %table;
&_maketable;
}
$s =~ s/([\200-\377][\000-\377]|[\000-\377])/
defined($table{$1}) && ++$n ? $table{$1} : $1
/ge;
&euc2jis(*s) if $jis;
$n;
}
Implementation of "jacode.pl"
sub tr {
# $prev_from, $prev_to, %table are persistent variables
local ( *s, $from, $to, $option ) = @_;
local ( @from, @to );
local ( $jis, $n ) = ( 0, 0 );
$jis++, &jis2euc(*s) if $s =~ /$re_esc_jp|$re_esc_asc|$re_esc_kana/o;
$jis++ if $to =~ /$re_esc_jp|$re_esc_asc|$re_esc_kana/o;
if ( !defined($prev_from)
|| $from ne $prev_from
|| $to ne $prev_to
|| $option ne $prev_opt ) # *** here (1of2) ***
{
( $prev_from, $prev_to, $prev_opt ) = ( $from, $to, $option ); # *** here (2of2) ***
undef %table;
&_maketable;
}
$s =~ s/([\x80-\xff][\x00-\xff]|[\x00-\xff])/
defined($table{$1}) && ++$n ? $table{$1} : $1
/ge;
&euc2jis(*s) if $jis;
$n;
}
Fixed Issue: defined(%hash) is deprecated at ./jcode.pl line nnn
jcode.pl makes fatal errors on perl 5.22 or later. jacode.pl removed this issue.
Stashes are now always defined
https://metacpan.org/release/JESSE/perl-5.14.0/view/pod/perldelta.pod#Stashes-are-now-always-defined
defined(@array) and defined(%hash) are now fatal errors
Implementation of "jcode.pl 2.13"
sub z2h_euc {
local(*s, $n) = @_;
&init_z2h_euc unless defined %z2h_euc; # *** here ***
$s =~ s/($re_euc_c|$re_euc_kana)/
$z2h_euc{$1} ? ($n++, $z2h_euc{$1}) : $1
/geo;
$n;
}
sub z2h_sjis {
local(*s, $n) = @_;
&init_z2h_sjis unless defined %z2h_sjis; # *** here ***
$s =~ s/($re_sjis_c)/$z2h_sjis{$1} ? ($n++, $z2h_sjis{$1}) : $1/geo;
$n;
}
Implementation of "jacode.pl"
sub z2h_euc {
local ( *s, $n ) = @_;
&init_z2h_euc unless %z2h_euc; # *** here ***
$s =~ s/($re_euc_c|$re_euc_kana)/
$z2h_euc{$1} ? ($n++, $z2h_euc{$1}) : $1
/geo;
$n;
}
sub z2h_sjis {
local ( *s, $n ) = @_;
&init_z2h_sjis unless %z2h_sjis; # *** here ***
$s =~ s/($re_sjis_c)/$z2h_sjis{$1} ? ($n++, $z2h_sjis{$1}) : $1/geo;
$n;
}
Let's ask your teachers
Why is this written in Perl4?
Why filename is "jacode.pl" not "jcode.pl" ?
Why package "jcode" supported also?
Why passing $line is by reference not by value to jacode::convert() ?
Why argument order of jacode::convert() is $OUTPUT_encoding, then $INPUT_encoding?
Why jacode::getcode supports halfwidth KATAKANA?
Why conversion between UTF-8 and SJIS needs table ?
Why is its table embedded in "jacode.pl"?
Why 'sjis' means CP932 not Shift_JIS?
AUTHOR
This project was originated by Kazumasa Utashiro-san.
https://metacpan.org/author/UTASHIRO
LICENSE AND COPYRIGHT
This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Copyright (c) 2010, 2011, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2022 INABA Hitoshi ina@cpan.org in a CPAN
The latest version of "jacode.pl" is available here:
http://search.cpan.org/dist/jacode/
ATTENTION
This software is not "jcode.pl". Thus don't redistribute this software renaming as "jcode.pl".
Moreover, this software IS NOT "Jacode4e". If you want "Jacode4e", search it on CPAN again.
Original version 'jcode.pl' is ...
Copyright (c) 2002 Kazumasa Utashiro
http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/
Copyright (c) 1995-2000 Kazumasa Utashiro utashiro@iij.ad.jp Internet Initiative Japan Inc. 3-13 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-0054, Japan
Copyright (c) 1992,1993,1994 Kazumasa Utashiro Software Research Associates, Inc.
Use and redistribution for ANY PURPOSE are granted as long as all copyright notices are retained. Redistribution with modification is allowed provided that you make your modified version obviously distinguishable from the original one. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Original version was developed under the name of srekcah@sra.co.jp February 1992 and it was called kconv.pl at the beginning. This address was a pen name for group of individuals and it is no longer valid.
The latest version of "jcode.pl" is available here:
http://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
SEE ALSO
UNIX MAGAZINE
1992 Apr
Pages: 148
T1008901040810 ZASSHI 08901-4
http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml
PERL PUROGURAMINGU
Larry Wall, Randal L.Schwartz, Yoshiyuki Kondo
December 1997
ISBN 4-89052-384-7
http://www.context.co.jp/~cond/books/old-books.html
Programming Perl, Second Edition
By Larry Wall, Tom Christiansen, Randal L. Schwartz
October 1996
Pages: 670
ISBN 10: 1-56592-149-6 | ISBN 13: 9781565921498
http://shop.oreilly.com/product/9781565921498.do
Programming Perl, Third Edition
By Larry Wall, Tom Christiansen, Jon Orwant
Third Edition July 2000
Pages: 1104
ISBN 10: 0-596-00027-8 | ISBN 13: 9780596000271
http://shop.oreilly.com/product/9780596000271.do
The Perl Language Reference Manual (for Perl version 5.12.1)
by Larry Wall and others
Paperback (6"x9"), 724 pages
Retail Price: $39.95 (pound 29.95 in UK)
ISBN-13: 978-1-906966-02-7
https://dl.acm.org/doi/book/10.5555/1893028
Perl Pocket Reference, 5th Edition
By Johan Vromans
Publisher: O'Reilly Media
Released: July 2011
Pages: 102
http://shop.oreilly.com/product/0636920018476.do
Programming Perl, 4th Edition
By: Tom Christiansen, brian d foy, Larry Wall, Jon Orwant
Publisher: O'Reilly Media
Formats: Print, Ebook, Safari Books Online
Print: January 2012
Ebook: March 2012
Pages: 1130
Print ISBN: 978-0-596-00492-7 | ISBN 10: 0-596-00492-3
Ebook ISBN: 978-1-4493-9890-3 | ISBN 10: 1-4493-9890-1
http://shop.oreilly.com/product/9780596004927.do
Perl Cookbook
By Tom Christiansen, Nathan Torkington
August 1998
Pages: 800
ISBN 10: 1-56592-243-3 | ISBN 13: 978-1-56592-243-3
http://shop.oreilly.com/product/9781565922433.do
Perl Cookbook, Second Edition
By Tom Christiansen, Nathan Torkington
Second Edition August 2003
Pages: 964
ISBN 10: 0-596-00313-7 | ISBN 13: 9780596003135
http://shop.oreilly.com/product/9780596003135.do
Perl in a Nutshell, Second Edition
By Stephen Spainhour, Ellen Siever, Nathan Patwardhan
Second Edition June 2002
Pages: 760
Series: In a Nutshell
ISBN 10: 0-596-00241-6 | ISBN 13: 9780596002411
http://shop.oreilly.com/product/9780596002411.do
Learning Perl on Win32 Systems
By Randal L. Schwartz, Erik Olson, Tom Christiansen
August 1997
Pages: 306
ISBN 10: 1-56592-324-3 | ISBN 13: 9781565923249
http://shop.oreilly.com/product/9781565923249.do
Learning Perl, Fifth Edition
By Randal L. Schwartz, Tom Phoenix, brian d foy
June 2008
Pages: 352
Print ISBN:978-0-596-52010-6 | ISBN 10: 0-596-52010-7
Ebook ISBN:978-0-596-10316-3 | ISBN 10: 0-596-10316-6
http://shop.oreilly.com/product/9780596520113.do
Learning Perl, 6th Edition
By Randal L. Schwartz, brian d foy, Tom Phoenix
June 2011
Pages: 390
ISBN-10: 1449303587 | ISBN-13: 978-1449303587
http://shop.oreilly.com/product/0636920018452.do
Advanced Perl Programming, 2nd Edition
By Simon Cozens
June 2005
Pages: 300
ISBN-10: 0-596-00456-7 | ISBN-13: 978-0-596-00456-9
http://shop.oreilly.com/product/9780596004569.do
Perl RESOURCE KIT UNIX EDITION
Futato, Irving, Jepson, Patwardhan, Siever
ISBN 10: 1-56592-370-7
http://shop.oreilly.com/product/9781565923706.do
Perl Resource Kit -- Win32 Edition
Erik Olson, Brian Jepson, David Futato, Dick Hardt
ISBN 10:1-56592-409-6
http://shop.oreilly.com/product/9781565924093.do
Announcing Perl 7
Jun 24, 2020 by brian d foy
https://www.perl.com/article/announcing-perl-7/
Understanding Japanese Information Processing
By Ken Lunde
O'Reilly Media
September 1993
Pages: 470
ISBN: 978-1-56592-043-9 | ISBN 10:1-56592-043-0
http://shop.oreilly.com/product/9781565920439.do
CJKV Information Processing Chinese, Japanese, Korean & Vietnamese Computing
By Ken Lunde
O'Reilly Media
Print: January 1999
Ebook: June 2009
Pages: 1128
Print ISBN:978-1-56592-224-2 | ISBN 10:1-56592-224-7
Ebook ISBN:978-0-596-55969-4 | ISBN 10:0-596-55969-0
http://shop.oreilly.com/product/9781565922242.do
CJKV Information Processing, 2nd Edition
By Ken Lunde
O'Reilly Media
Print: December 2008
Ebook: June 2009
Pages: 912
Print ISBN: 978-0-596-51447-1 | ISBN 10:0-596-51447-6
Ebook ISBN: 978-0-596-15782-1 | ISBN 10:0-596-15782-7
http://shop.oreilly.com/product/9780596514471.do
DB2 GIJUTSU ZENSHO
By BM Japan Systems Engineering Co.,Ltd. and IBM Japan, Ltd.
2004/05
Pages: 887
ISBN-10: 4756144659 | ISBN-13: 978-4756144652
https://iss.ndl.go.jp/books/R100000002-I000007400836-00
Mastering Regular Expressions, Second Edition
By Jeffrey E. F. Friedl
Second Edition July 2002
Pages: 484
ISBN 10: 0-596-00289-0 | ISBN 13: 9780596002893
http://shop.oreilly.com/product/9780596002893.do
Mastering Regular Expressions, Third Edition
By Jeffrey E. F. Friedl
Third Edition August 2006
Pages: 542
ISBN 10: 0-596-52812-4 | ISBN 13:9780596528126
http://shop.oreilly.com/product/9780596528126.do
Regular Expressions Cookbook
By Jan Goyvaerts, Steven Levithan
May 2009
Pages: 512
ISBN 10:0-596-52068-9 | ISBN 13: 978-0-596-52068-7
http://shop.oreilly.com/product/9780596520694.do
Regular Expressions Cookbook, 2nd Edition
By Steven Levithan, Jan Goyvaerts
Released August 2012
Pages: 612
ISBN: 9781449327453
https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/
JIS KANJI JITEN
Kouji Shibano
Pages: 1456
ISBN 4-542-20129-5
http://www.webstore.jsa.or.jp/lib/lib.asp?fn=/manual/mnl01_12.htm
UNIX MAGAZINE
1993 Aug
Pages: 172
T1008901080816 ZASSHI 08901-8
http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml
Shell Script Magazine vol.41
2016 September
Pages: 64
https://shell-mag.com/
LINUX NIHONGO KANKYO
By YAMAGATA Hiroo, Stephen J. Turnbull, Craig Oda, Robert J. Bickel
June, 2000
Pages: 376
ISBN 4-87311-016-5
https://www.oreilly.co.jp/books/4873110165/
MacPerl Power and Ease
By Vicki Brown, Chris Nandor
April 1998
Pages: 350
ISBN 10: 1881957322 | ISBN 13: 978-1881957324
http://www.amazon.com/Macperl-Power-Ease-Vicki-Brown/dp/1881957322
Other Tools
https://metacpan.org/dist/Jacode
https://metacpan.org/dist/Jacode4e
https://metacpan.org/dist/Jacode4e-RoundTrip
https://metacpan.org/dist/Perl7-Handy
https://metacpan.org/dist/UTF8-R2
https://metacpan.org/dist/mb
BackPAN
http://backpan.perl.org/authors/id/I/IN/INA/
Recent Perl packages by "INABA Hitoshi"
http://code.activestate.com/ppm/author:INABA-Hitoshi/
ACKNOWLEDGEMENTS
This software was made referring to software and the document that the following hackers or persons had made. I am thankful to all persons.
Larry Wall, Perl
http://www.perl.org/
Jesse Vincent, Compatibility is a virtue
https://www.nntp.perl.org/group/perl.perl5.porters/2010/05/msg159825.html
Kazumasa Utashiro, jcode.pl: Perl library for Japanese character code conversion, Kazumasa Utashiro
https://metacpan.org/author/UTASHIRO
ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/
ftp://ftp.oreilly.co.jp/pcjp98/utashiro/
http://mail.pm.org/pipermail/tokyo-pm/2002-March/001319.html
https://twitter.com/uta46/status/11578906320
mikeneko creator club, Private manual of jcode.pl
http://mikeneko.creator.club.ne.jp/~lab/kcode/jcode.html
gama, getcode.pl
http://www2d.biglobe.ne.jp/~gama/cgi/jcode/jcode.htm
Gappai, jcodeg.diff
http://www.vector.co.jp/soft/win95/prog/se347514.html
OHZAKI Hiroki, Perl memo
http://www.din.or.jp/~ohzaki/perl.htm#JP_Code
NAKATA Yoshinori, Ad hoc patch for reduce waring on h2z_euc
http://white.niu.ne.jp/yapw/yapw.cgi/jcode.pl%A4%CE%A5%A8%A5%E9%A1%BC%CD%DE%C0%A9
Dan Kogai, Jcode module and Encode module
https://metacpan.org/release/Encode
https://metacpan.org/release/Jcode
http://blog.livedoor.jp/dankogai/archives/50116398.html
http://blog.livedoor.jp/dankogai/archives/51004472.html
Donzoko CGI+--, Jcode like Encode Wrapper
http://www.donzoko.net/cgi/jencode/
Yusuke Kawasaki, Encode561 module
http://www.kawa.net/works/perl/i18n-emoji/i18n-emoji.html#Encode561
Tokyo-pm archive
http://mail.pm.org/pipermail/tokyo-pm/
utf8_possible_story, Perl de Nihongo Aruaru
http://aizen.likk.jp/slide/utf8_possible_story/
Very old fj.kanji discussion
http://www.ie.u-ryukyu.ac.jp/~kono/fj/fj.kanji/index.html
TechLION vol.26
https://type.jp/et/feature/1569
Kaoru Maeda, Perl's history Perl 1,2,3,4
https://www.slideshare.net/KaoruMaeda/perl-perl-1234
nurse, What is "string"
https://naruse.hateblo.jp/entries/2014/11/07#1415355181
NISHIO Hirokazu, What's meant "string as a sequence of characters"?
https://nishiohirokazu.hatenadiary.org/entry/20141107/1415286729
Rick Yamashita, Shift_JIS
https://shino.tumblr.com/post/116166805/%E5%B1%B1%E4%B8%8B%E8%89%AF%E8%94%B5%E3%81%A8%E7%94%B3%E3%81%97%E3%81%BE%E3%81%99-%E7%A7%81%E3%81%AF1981%E5%B9%B4%E5%BD%93%E6%99%82us%E3%81%AE%E3%83%9E%E3%82%A4%E3%82%AF%E3%83%AD%E3%82%BD%E3%83%95%E3%83%88%E3%81%A7%E3%82%B7%E3%83%95%E3%83%88jis%E3%81%AE%E3%83%87%E3%82%B6%E3%82%A4%E3%83%B3%E3%82%92%E6%8B%85%E5%BD%93
http://www.wdic.org/w/WDIC/%E3%82%B7%E3%83%95%E3%83%88JIS
nurse, History of Japanese EUC 22:00
https://naruse.hateblo.jp/entries/2009/03/08
Ricardo Signes, Perl 5.14 for Pragmatists
https://www.slideshare.net/rjbs/perl-514-8809465
Ricardo Signes, What's New in Perl? v5.10 - v5.16 #'
https://www.slideshare.net/rjbs/whats-new-in-perl-v510-v516
Causes and countermeasures for garbled Japanese characters in perl
https://prozorec.hatenablog.com/entry/2018/03/19/080000
Impressions of talking of Larry Wall at LL Future
https://hnw.hatenablog.com/entry/20080903
About Windows and Japanese text
https://blogs.windows.com/japan/2020/02/20/about-windows-and-japanese-text/
About Windows diagnostic data
https://blogs.windows.com/japan/2019/12/05/about-windows-diagnostic-data/