NAME

jacode - Perl program for Japanese character code conversion

SYNOPSIS

works like 'jcode.pl' in your script

require 'jacode.pl';

     ($subref, $got_INPUT_encoding) = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
                $got_INPUT_encoding = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
            ($esc_DBCS, $esc_ASCII) = jacode::get_inout($line)
($esc_DBCS_fully, $esc_ASCII_fully) = jacode::jis_inout([$esc_DBCS [, $esc_ASCII]])
       ($matched_length, $encoding) = jacode::getcode(\$line)
                          $encoding = jacode::getcode(\$line)
                                      jacode::init()

works as 'pkf' command on command line (shows help)

$ perl jacode.pl

INSTALL of "jacode.pl"

1. Open URL of "jacode.pl"

https://metacpan.org/dist/Jacode/view/lib/jacode.pl

2. Click This
----------------------------------
Source (raw) <--- Click this (raw)
Browse (raw)
Changes
How to Contribute
Repository
Issues
Testers (NNN / NNN / NNN)
Kwalitee
Bus factor: NN
NN.NN% Coverage
License: perl_5
Perl: v5.5.30
----------------------------------
3. Select All Text of Page
4. Save Text as "jacode.pl"

DESCRIPTION

This software can convert each other "JIS", "SJIS", "EUC-JP", and "UTF-8" that are frequently used as encoding for Japanese string.

Interface of "jacode.pl" is same of "jcode.pl" that we know well.

On the other hand its ability is same of "Encode" module that can everything to convert character encoding.

Moreover "jacode.pl" works like pkf command on command line. You can get help message with following command:

$ perl jacode.pl

The code conversion from "sjis" to "utf8" is done by using following table.

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

From "utf8" to "sjis" is done by using the "CP932.TXT" and following table.

PRB: Conversion Problem Between Shift-JIS and Unicode

http://support.microsoft.com/kb/170559/en-us

What is good of this software

  • jcode.pl upper compatible

  • pkf command upper compatible

  • is Perl4 script and also Perl5 script

  • supports HALFWIDTH KATAKANA

  • supports UTF-8 by cp932 to Unicode table

  • powered by Encode::from_to (not only Japanese!)

So we believe this software will be useful for DX (Digital Transformation) and IT modernization of Japanese information processing.

DEPENDENCIES

Running this software requires perl 4.036 or later.

DEFINITIONS

To easy documentation, this document uses following words for encoding name.

SUBROUTINES

($subref, $got_INPUT_encoding) = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])

$got_INPUT_encoding = jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])

Converts "$line" from "$INPUT_encoding" to "$OUTPUT_encoding" then overwrites "$line".

"$OUTPUT_encoding" can be any of "jis", "sjis", "euc", or "utf8", or when you want to no convert, can "noconv".

"$INPUT_encoding" can be any of "jis", "sjis", "euc", or "utf8".

If "$INPUT_encoding" is omitted, "jacode::getcode(\$line)" is called internally and its return value is used as "$INPUT_encoding". However, today, we have utf8, sjis, and euc encodings, thus "jacode::getcode()" cannot guess enough unlike old days. So we must never omit it.

"$option" can be omit or "h" or "z".

  • "h" means converting ZENKAKU-KATAKANA to HANKAKU-KATAKANA

  • "z" means converting HANKAKU-KATAKANA to ZENKAKU-KATAKANA

On perl 5.8.1 or later, if "$OUTPUT_encoding" or "$INPUT_encoding" is neither "jis", "sjis", "euc" nor "utf8" then "jacode::convert()" works as "Encode::from_to()".

jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding, $option)

works as

Encode::from_to( $line, $INPUT_encoding, $OUTPUT_encoding )

Returns ("$subref", "$got_INPUT_encoding") when this subroutine was called in list context.

Returns only "$got_INPUT_encoding" when this subroutine was called in scalar context.

"$subref" is reference of subroutine that does convert.

"$got_INPUT_encoding" is "$INPUT_encoding" or return value of "jacode::getcode(\$line)".

($esc_DBCS, $esc_ASCII) = jacode::get_inout($line)

Looks for DBCS start sequences and ASCII start sequences from "$line".

"$esc_DBCS" is escape sequence at start of DBCS or undef if not found.

"$esc_ASCII" is escape sequence at start of ASCII or undef if not found.

($esc_DBCS_fully, $esc_ASCII_fully) = jacode::jis_inout([$esc_DBCS [, $esc_ASCII]])

Sets DBCS start sequences and ASCII start sequences.

"$esc_ASCII" will be "ESC-(-B" if "$esc_ASCII" is omitted.

"$esc_DBCS" will be "ESC-$-B" if "$esc_DBCS" is omitted.

Returns "$esc_DBCS_fully" that is fully escape sequence of DBCS.

Returns "$esc_ASCII_fully" that is fully escape sequence of ASCII.

You can use also short-sequence for "$esc_DBCS".

--------------------------------------------------------
short-sequence  full-sequence    means
--------------------------------------------------------
@               ESC-$-@          JIS C 6226-1978
B               ESC-$-B          JIS X 0208-1983
&               ESC-&@-ESC-$-B   JIS X 0208-1990
O               ESC-$-(-O        JIS X 0213:2000 plane1
Q               ESC-$-(-Q        JIS X 0213:2004 plane1
--------------------------------------------------------

jacode::init()

Initialize the variables used in this package.

Call "jacode::init()" first if you embedded the "jacode.pl" at the end of your script. You don't have to call this when using "jocde.pl" by "do" or "require" interface.

Other SUBROUTINES and VARIABLES

jacode::xxx2yyy(\$line [, $option])

Converts encoding of "$line" from "xxx" to "yyy" then overwrites "$line".

"xxx" and "yyy" can be "jis", "euc", "sjis" or "utf8".

jacode::euc2euc(\$line [, $option])
jacode::euc2jis(\$line [, $option])
jacode::euc2sjis(\$line [, $option])
jacode::euc2utf8(\$line [, $option])

jacode::jis2euc(\$line [, $option])
jacode::jis2jis(\$line [, $option])
jacode::jis2sjis(\$line [, $option])
jacode::jis2utf8(\$line [, $option])

jacode::sjis2euc(\$line [, $option])
jacode::sjis2jis(\$line [, $option])
jacode::sjis2sjis(\$line [, $option])
jacode::sjis2utf8(\$line [, $option])

jacode::utf82euc(\$line [, $option])
jacode::utf82jis(\$line [, $option])
jacode::utf82sjis(\$line [, $option])
jacode::utf82utf8(\$line [, $option])

"$option" can be omit or "h" or "z".

  • "h" means converting ZENKAKU-KATAKANA to HANKAKU-KATAKANA

  • "z" means converting HANKAKU-KATAKANA to ZENKAKU-KATAKANA

Returns coount of converted characters.

$line_by_OUTPUT_encoding = jacode::to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])

This subroutine works as following.

local ( $OUTPUT_encoding, $s, $INPUT_encoding, $option ) = @_;
&convert( *s, $OUTPUT_encoding, $INPUT_encoding, $option );
$s;

This subroutine is easy to use in "s///e" operator or other place since return by value.

We should no longer omit "$INPUT_encoding".

$option can be omit or "h" or "z".

$line_by_jis = jacode::jis($line, $INPUT_encoding [, $option])

This subroutine works as "jacode::to('jis', @_)".

$line_by_euc = jacode::euc($line, $INPUT_encoding [, $option])

This subroutine works as "jacode::to('euc', @_)".

$line_by_sjis = jacode::sjis($line, $INPUT_encoding [, $option])

This subroutine works as "jacode::to('sjis', @_)".

$line_by_utf8 = jacode::utf8($line, $INPUT_encoding [, $option])

This subroutine works as "jacode::to('utf8', @_)".

$transliterated_char_count = jacode::h2z_euc(\$line)

Converts HANKAKU-KATAKANA string by euc to ZENKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::h2z_jis(\$line)

Converts HANKAKU-KATAKANA string by jis to ZENKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::h2z_sjis(\$line)

Converts HANKAKU-KATAKANA string by sjis to ZENKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::h2z_utf8(\$line)

Converts HANKAKU-KATAKANA string by utf8 to ZENKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::z2h_euc(\$line)

Converts ZENKAKU-KATAKANA by euc to HANKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::z2h_jis(\$line)

Converts ZENKAKU-KATAKANA by jis to HANKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::z2h_sjis(\$line)

Converts ZENKAKU-KATAKANA by sjis to HANKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

$transliterated_char_count = jacode::z2h_utf8(\$line)

Converts ZENKAKU-KATAKANA by utf8 to HANKAKU-KATAKANA string then overwrites "$line".

Returns "$transliterated_char_count" that means count of transliterated characters.

($matched_length, $encoding) = jacode::getcode(\$line)

$encoding = jacode::getcode(\$line)

Use of this subroutine is deprecated.

Returns a guessed encoding of string in "$line".

Returns ($matched_length, $encoding) when list context.

Returns only "$encoding" when scalar context.

"$encoding" is any of

  • jis

    can guess "$line" is jis encoded.

  • sjis

    can guess "$line" is sjis encoded.

  • euc

    can guess "$line" is euc encoded.

  • utf8

    can guess "$line" is utf8 encoded.

  • binary

    "$line" has binary data.

  • undef

    cannot guess encoding of "$line".

"$matched_length" means parsable length in octet unit if $encoding is "sjis" or "euc" or "utf8". Maybe this value never be useful.

Do not expect that this subroutine is perfect. Because it is impossible to fully recognize and determine "sjis", "euc" and "utf8".

($matched_length, $encoding) = jacode::getcode_utashiro_2000_09_29(\$line)

$encoding = jacode::getcode_utashiro_2000_09_29(\$line)

Keeps original implementation "&jcode'getcode()" of "jcode.pl" by Kazumasa Utashiro-san.

Not supports utf8.

$previous_caching_state = jacode::cache()

$previous_caching_state = jacode::nocache()

jacode::flushcache()

jacode::flush()

In default, converted character is cached in memory to avoid same calculations have to be done many times.

To disable this caching feature, call "jacode::nocache()". You can be enable again by calling "jacode::cache()". And you can clear cache memory by calling "jacode::flushcache()".

"jacode::cache()" and "jacode::nocache()" subroutines return previous caching state.

  • 1 means that previous caching state was enable.

  • 0 means that previous caching state was disable.

"jacode::flush()" works as "jacode::flushcache()" for cover old document's bug.

$transliterated_char_count = jacode::tr(\$line, $from, $to [, $option])

Use of this subroutine is deprecated.

We recommend using mb::tr() of mb.pm modulino or UTF8::R2::tr() of UTF8::R2 module.

Like "tr///" operator, this subroutine transliterates "$line" from "$from" to "$to" then overwrites "$line".

"$option" can only "d" that means "tr///d".

Returns coount of transliterated characters.

This subroutine can work only jis or euc. This subroutine can work only JIS X 0208, cannot JIS X 0212.

"$from" and "$to" can use range like an "A-Z". Start character and end character by "-(hyphen)" in DBCS must have same value at first byte in them. If you want to use "-(hyphen itself)" in "$from" or "$to", "-" must put last position of them.

$line = 'ABCDEF';
jacode::tr(\$line, 'ABC-E', '12345'); # works as $line =~ tr/ABC-E/12345/
# got '12345F' in $line

$line = 'ABCDEF';
jacode::tr(\$line, 'ABC-E', '12345', 'd'); # with /d modifier
# got '12345' in $line

$line = 'A-B-C';
jacode::tr(\$line, 'ABC-', '123*'); # "-" must be last
# got '1*2*3' in $line

$transliterated_line = jacode::trans($line, $from, $to [, $option])

Use of this subroutine is deprecated.

We recommend using mb::tr() of mb.pm modulino or UTF8::R2::tr() of UTF8::R2 module.

Translates "$line" from "$from" to "$to" then returns translated "$line" by value.

"jacode::trans()" calls "jacode::tr()" internally.

See also "jacode::tr()".

$jacode::convf{ join($;, 'jis', 'sjis') }

Provides reference of subroutine "jacode::jis2sjis()" that converts string jis encoding to sjis encoding.

$jacode::convf{ join($;, 'jis', 'euc') }

Provides reference of subroutine "jacode::jis2euc()" that converts string jis encoding to euc encoding.

$jacode::convf{ join($;, 'jis', 'utf8') }

Provides reference of subroutine "jacode::jis2utf8()" that converts string jis encoding to utf8 encoding.

$jacode::convf{ join($;, 'euc', 'jis') }

Provides reference of subroutine "jacode::euc2jis()" that converts string euc encoding to jis encoding.

$jacode::convf{ join($;, 'euc', 'sjis') }

Provides reference of subroutine "jacode::euc2sjis()" that converts string euc encoding to sjis encoding.

$jacode::convf{ join($;, 'euc', 'utf8') }

Provides reference of subroutine "jacode::euc2utf8()" that converts string euc encoding to utf8 encoding.

$jacode::convf{ join($;, 'sjis', 'jis') }

Provides reference of subroutine "jacode::sjis2jis()" that converts string sjis encoding to jis encoding.

$jacode::convf{ join($;, 'sjis', 'euc') }

Provides reference of subroutine "jacode::sjis2euc()" that converts string sjis encoding to euc encoding.

$jacode::convf{ join($;, 'sjis', 'utf8') }

Provides reference of subroutine "jacode::sjis2utf8()" that converts string sjis encoding to utf8 encoding.

$jacode::convf{ join($;, 'utf8', 'jis') }

Provides reference of subroutine "jacode::utf82jis()" that converts string utf8 encoding to jis encoding.

$jacode::convf{ join($;, 'utf8', 'sjis') }

Provides reference of subroutine "jacode::utf82sjis()" that converts string utf8 encoding to sjis encoding.

$jacode::convf{ join($;, 'utf8', 'euc') }

Provides reference of subroutine "jacode::utf82euc()" that converts string utf8 encoding to euc encoding.

$jacode::h2zf{'euc'}

Provides reference of subroutine "jacode::h2z_euc()" that converts HANKAKU-KATAKANA string by euc to ZENKAKU-KATAKANA string.

$jacode::h2zf{'jis'}

Provides reference of subroutine "jacode::h2z_jis()" that converts HANKAKU-KATAKANA string by jis to ZENKAKU-KATAKANA string.

$jacode::h2zf{'sjis'}

Provides reference of subroutine "jacode::h2z_sjis()" that converts HANKAKU-KATAKANA string by sjis to ZENKAKU-KATAKANA string.

$jacode::h2zf{'utf8'}

Provides reference of subroutine "jacode::h2z_utf8()" that converts HANKAKU-KATAKANA string by utf8 to ZENKAKU-KATAKANA string.

$jacode::z2hf{'euc'}

Provides reference of subroutine "jacode::z2h_euc()" that converts ZENKAKU-KATAKANA string by euc to HANKAKU-KATAKANA string.

$jacode::z2hf{'jis'}

Provides reference of subroutine "jacode::z2h_jis()" that converts ZENKAKU-KATAKANA string by jis to HANKAKU-KATAKANA string.

$jacode::z2hf{'sjis'}

Provides reference of subroutine "jacode::z2h_sjis()" that converts ZENKAKU-KATAKANA string by sjis to HANKAKU-KATAKANA string.

$jacode::z2hf{'utf8'}

Provides reference of subroutine "jacode::z2h_utf8()" that converts ZENKAKU-KATAKANA string by utf8 to HANKAKU-KATAKANA string.

Old Interface on Perl4

On Perl4, you need "&" to call subroutines.

This software provides also package "jcode" for "jcode.pl" users. An '(single quote) is required after package name "jcode", like "&jcode'convert".

On Perl4 uses globs like this "*line", not references.

&jcode'convert(*line, $OUTPUT_encoding, $INPUT_encoding [, $option])
&jcode'getcode_utashiro_2000_09_29(*line)
&jcode'getcode(*line)
&jcode'get_inout($line)
&jcode'jis_inout([$esc_DBCS [, $esc_ASCII]])
&jcode'init()
&jcode'xxx2yyy(*line [, $option])
$jcode'convf{join($;, 'xxx', 'yyy')}
&jcode'to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])
&jcode'jis($line, $INPUT_encoding [, $option])
&jcode'euc($line, $INPUT_encoding [, $option])
&jcode'sjis($line, $INPUT_encoding [, $option])
&jcode'utf8($line, $INPUT_encoding [, $option])
&jcode'cache()
&jcode'nocache()
&jcode'flushcache()
&jcode'flush()
&jcode'h2z_xxx(*line)
&jcode'z2h_xxx(*line)
$jcode'z2hf{'xxx'}
$jcode'h2zf{'xxx'}
&jcode'tr(*line, $from, $to [, $option])
&jcode'trans($line, $from, $to [, $option])

Old Interface on Perl5

On Perl5, "&" is not required to call subroutines.

This software provides also package "jcode" for "jcode.pl" users. "::(double colons)" are required after package name "jcode", like "jcode::convert".

We have to use reference for parameter "$line". Because lexical variable is not a subject of typeglob, "*string" style call doesn't work if the variable is declared as "my". Same thing happens to special variable "$_" if the perl is compiled to use thread capability.

jcode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, $option])
jcode::getcode_utashiro_2000_09_29(\$line)
jcode::getcode(\$line)
jcode::get_inout($line)
jcode::jis_inout([$esc_DBCS [, $esc_ASCII]])
jcode::init()
jcode::xxx2yyy(\$line [, $option])
&{$jcode::convf{join($;, 'xxx', 'yyy')}}(\$line)
jcode::to($OUTPUT_encoding, $line, $INPUT_encoding [, $option])
jcode::jis($line, $INPUT_encoding [, $option])
jcode::euc($line, $INPUT_encoding [, $option])
jcode::sjis($line, $INPUT_encoding [, $option])
jcode::utf8($line, $INPUT_encoding [, $option])
jcode::cache()
jcode::nocache()
jcode::flushcache()
jcode::flush()
jcode::h2z_xxx(\$line)
jcode::z2h_xxx(\$line)
&{$jcode::z2hf{'xxx'}}(\$line)
&{$jcode::h2zf{'xxx'}}(\$line)
jcode::tr(\$line, $from, $to [, $option])
jcode::trans($line, $from, $to [, $option])

SAMPLES

Convert SJIS to JIS and print each line with encoding name at head

#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
    $INPUT_encoding = &jcode'convert(*s, 'jis', 'sjis');
    print $INPUT_encoding, "\t", $s;
}

The safest way of JIS conversion

#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
    ($matched, $INPUT_encoding) = &jcode'getcode(*s);
    if ((@buf == 0) && ($matched == 0)) {
        print $s;
        next;
    }
    push(@buf, $s);
    next unless $INPUT_encoding;
    while (defined($s = shift(@buf))) {
        &jcode'convert(*s, 'jis', $INPUT_encoding);
        print $s;
    }
    while (defined($s = <>)) {
        &jcode'convert(*s, 'jis', $INPUT_encoding);
        print $s;
    }
    last;
}
print @buf if @buf;

Convert SJIS to UTF-8 and print each line by perl 4.036 or later

#retire 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
    &jacode'convert(*s, 'utf8', 'sjis');
    print $s;
}

Convert SJIS to UTF-8.1 and print each line by perl 4.036 or later

require 'jacode.pl';
while (defined($s = <>)) {

    # STEP 1of2 converts SJIS to UTF-8.0
    &jacode'convert(*s, 'utf8', 'sjis');

    # STEP 2of2 converts UTF-8.0 to UTF-8.1 see also https://metacpan.org/pod/Jacode4e#UTF-8.0-vs.-UTF-8.1
    $s =~ s#\xe2\x80\x94#\xe2\x80\x95#g;
    $s =~ s#\xe2\x80\x96#\xe2\x88\xa5#g;
    $s =~ s#\xe2\x88\x92#\xef\xbc\x8d#g;

    # got UTF-8.1
    print $s;
}

Convert SJIS to UTF16-BE and print each line by perl 5.8.1 or later

require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
    jacode::convert(\$s, 'UTF16-BE', 'sjis');
    print $s;
}

Convert SJIS to MIME-Header-ISO_2022_JP and print each line by perl 5.8.1 or later

require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
    jacode::convert(\$s, 'MIME-Header-ISO_2022_JP', 'sjis');
    print $s;
}

BUGS AND LIMITATIONS

You must use -Llatin switch if you use on the JPerl4.

You must use -b switch if you use on the JPerl5.

We have tested and verified this software using the best of our ability. However, a software containing much code is bound to contain some bugs. Thus, if you happen to find a bug that's in jacode.pl and not your own program, you can try to reduce it to a minimal test case and then report it to author's address. If you have an idea that could make this a more useful tool, please let everyone share it.

SOFTWARE LIFE CYCLE

                                       Jacode.pm
                  jcode.pl  Encode.pm  jacode.pl  Jacode4e  Jacode4e::RoundTrip
--------------------------------------------------------------------------------
1993 Perl4.036       |                     |                                    
  :     :            :                     :                                    
1999 Perl5.00503     |                     |         |               |          
2000 Perl5.6         |                     |         |               |          
2002 Perl5.8         |         Born        |         |               |          
2007 Perl5.10        V          |          |         |               |          
2010 Perl5.12       EOL         |         Born       |               |          
2011 Perl5.14                   |          |         |               |          
2012 Perl5.16                   |          |         |               |          
2013 Perl5.18                   |          |         |               |          
2014 Perl5.20                   |          |         |               |          
2015 Perl5.22                   |          |         |               |          
2016 Perl5.24                   |          |         |               |          
2017 Perl5.26                   |          |         |               |          
2018 Perl5.28                   |          |        Born            Born        
2019 Perl5.30                   |          |         |               |          
2020 Perl5.32                   :          :         :               :          
2030 Perl5.52                   :          :         :               :          
2040 Perl5.72                   :          :         :               :          
2050 Perl5.92                   :          :         :               :          
2060 Perl5.112                  :          :         :               :          
2070 Perl5.132                  :          :         :               :          
2080 Perl5.152                  :          :         :               :          
2090 Perl5.172                  :          :         :               :          
2100 Perl5.192                  :          :         :               :          
2110 Perl5.212                  :          :         :               :          
2120 Perl5.232                  :          :         :               :          
  :     :                       V          V         V               V          
--------------------------------------------------------------------------------

Removed jcode.pl's Bug

jacode.pl removed following 2 bugs that jcode.pl had.

Bad $n count in jcode'_jis2sjis()

Implementation of "jcode.pl 2.13"

sub _jis2sjis {
    local($esc, $s) = @_;
    if ($esc =~ /^$re_jis0212/o) {
        $s =~ s/../$undef_sjis/g;
        $n = length; # *** here ***
    }
    elsif ($esc !~ /^$re_asc/o) {
        $n += $s =~ tr/\041-\176/\241-\376/;
        if ($esc =~ /^$re_jp/o) {
            $s =~ s/($re_euc_c)/$e2s{$1}||&e2s($1)/geo;
        }
    }
    $s;
}

Implementation of "jacode.pl"

sub _jis2sjis {
    local ( $esc, $s ) = @_;
    if ( $esc =~ /^$re_esc_asc/o ) {
    }
    elsif ( $esc =~ /^$re_esc_kana/o ) {
        $s =~ tr/\x21-\x7e/\xa1-\xfe/;
        $n += length($s);
    }
    elsif ( $esc =~ /^$re_esc_jis0212/o ) {
        $s =~ s/[\x00-\xff][\x00-\xff]/$n++, $undef_sjis/ge; # *** here ***
    }
    else {
        $s =~ tr/\x21-\x7e/\xa1-\xfe/;
        $s =~ s/($re_euc_c)/$n++, ($e2s{$1}||&e2s($1))/geo;
    }
    $s;
}

jcode'tr() was ignoring options

Implementation of "jcode.pl 2.13"

sub tr {
    # $prev_from, $prev_to, %table are persistent variables
    local(*s, $from, $to, $opt) = @_;
    local(@from, @to);
    local($jis, $n) = (0, 0);
    $jis++, &jis2euc(*s) if $s =~ /$re_jp|$re_asc|$re_kana/o;
    $jis++ if $to =~ /$re_jp|$re_asc|$re_kana/o;
    if (!defined($prev_from) || $from ne $prev_from || $to ne $prev_to) { # *** here (1of2) ***
        ($prev_from, $prev_to) = ($from, $to); # *** here (2of2) ***
        undef %table;
        &_maketable;
    }
    $s =~ s/([\200-\377][\000-\377]|[\000-\377])/
        defined($table{$1}) && ++$n ? $table{$1} : $1
    /ge;
    &euc2jis(*s) if $jis;
    $n;
}

Implementation of "jacode.pl"

sub tr {
    # $prev_from, $prev_to, %table are persistent variables
    local ( *s, $from, $to, $option ) = @_;
    local ( @from, @to );
    local ( $jis, $n ) = ( 0, 0 );
    $jis++, &jis2euc(*s) if $s =~ /$re_esc_jp|$re_esc_asc|$re_esc_kana/o;
    $jis++ if $to =~ /$re_esc_jp|$re_esc_asc|$re_esc_kana/o;
    if (   !defined($prev_from)
        || $from   ne $prev_from
        || $to     ne $prev_to
        || $option ne $prev_opt ) # *** here (1of2) ***
    {
        ( $prev_from, $prev_to, $prev_opt ) = ( $from, $to, $option ); # *** here (2of2) ***
        undef %table;
        &_maketable;
    }
    $s =~ s/([\x80-\xff][\x00-\xff]|[\x00-\xff])/
    defined($table{$1}) && ++$n ? $table{$1} : $1
    /ge;
    &euc2jis(*s) if $jis;
    $n;
}

Fixed Issue: defined(%hash) is deprecated at ./jcode.pl line nnn

jcode.pl makes fatal errors on perl 5.22 or later. jacode.pl removed this issue.

Stashes are now always defined

https://metacpan.org/release/JESSE/perl-5.14.0/view/pod/perldelta.pod#Stashes-are-now-always-defined

defined(@array) and defined(%hash) are now fatal errors

https://metacpan.org/release/RJBS/perl-5.22.0/view/pod/perldelta.pod#defined(@array)-and-defined(%hash)-are-now-fatal-errors

Implementation of "jcode.pl 2.13"

sub z2h_euc {
    local(*s, $n) = @_;
    &init_z2h_euc unless defined %z2h_euc; # *** here ***
    $s =~ s/($re_euc_c|$re_euc_kana)/
        $z2h_euc{$1} ? ($n++, $z2h_euc{$1}) : $1
    /geo;
    $n;
}

sub z2h_sjis {
    local(*s, $n) = @_;
    &init_z2h_sjis unless defined %z2h_sjis; # *** here ***
    $s =~ s/($re_sjis_c)/$z2h_sjis{$1} ? ($n++, $z2h_sjis{$1}) : $1/geo;
    $n;
}

Implementation of "jacode.pl"

sub z2h_euc {
    local ( *s, $n ) = @_;
    &init_z2h_euc unless %z2h_euc; # *** here ***
    $s =~ s/($re_euc_c|$re_euc_kana)/
    $z2h_euc{$1} ? ($n++, $z2h_euc{$1}) : $1
    /geo;
    $n;
}

sub z2h_sjis {
    local ( *s, $n ) = @_;
    &init_z2h_sjis unless %z2h_sjis; # *** here ***
    $s =~ s/($re_sjis_c)/$z2h_sjis{$1} ? ($n++, $z2h_sjis{$1}) : $1/geo;
    $n;
}

Let's ask your teachers

  • Why is this written in Perl4?

  • Why filename is "jacode.pl" not "jcode.pl" ?

  • Why package "jcode" supported also?

  • Why passing $line is by reference not by value to jacode::convert() ?

  • Why argument order of jacode::convert() is $OUTPUT_encoding, then $INPUT_encoding?

  • Why jacode::getcode supports halfwidth KATAKANA?

  • Why conversion between UTF-8 and SJIS needs table ?

  • Why is its table embedded in "jacode.pl"?

  • Why 'sjis' means CP932 not Shift_JIS?

AUTHOR

This project was originated by Kazumasa Utashiro-san.

https://metacpan.org/author/UTASHIRO

LICENSE AND COPYRIGHT

This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Copyright (c) 2010, 2011, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2022 INABA Hitoshi ina@cpan.org in a CPAN

The latest version of "jacode.pl" is available here:

http://search.cpan.org/dist/jacode/

ATTENTION

This software is not "jcode.pl". Thus don't redistribute this software renaming as "jcode.pl".

Moreover, this software IS NOT "Jacode4e". If you want "Jacode4e", search it on CPAN again.

Original version 'jcode.pl' is ...

Copyright (c) 2002 Kazumasa Utashiro

http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/

Copyright (c) 1995-2000 Kazumasa Utashiro utashiro@iij.ad.jp Internet Initiative Japan Inc. 3-13 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-0054, Japan

Copyright (c) 1992,1993,1994 Kazumasa Utashiro Software Research Associates, Inc.

Use and redistribution for ANY PURPOSE are granted as long as all copyright notices are retained. Redistribution with modification is allowed provided that you make your modified version obviously distinguishable from the original one. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Original version was developed under the name of srekcah@sra.co.jp February 1992 and it was called kconv.pl at the beginning. This address was a pen name for group of individuals and it is no longer valid.

The latest version of "jcode.pl" is available here:

http://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/

SEE ALSO

UNIX MAGAZINE
1992 Apr
Pages: 148
T1008901040810 ZASSHI 08901-4
http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml

PERL PUROGURAMINGU
Larry Wall, Randal L.Schwartz, Yoshiyuki Kondo
December 1997
ISBN 4-89052-384-7
http://www.context.co.jp/~cond/books/old-books.html

Programming Perl, Second Edition
By Larry Wall, Tom Christiansen, Randal L. Schwartz
October 1996
Pages: 670
ISBN 10: 1-56592-149-6 | ISBN 13: 9781565921498
http://shop.oreilly.com/product/9781565921498.do

Programming Perl, Third Edition
By Larry Wall, Tom Christiansen, Jon Orwant
Third Edition  July 2000
Pages: 1104
ISBN 10: 0-596-00027-8 | ISBN 13: 9780596000271
http://shop.oreilly.com/product/9780596000271.do

The Perl Language Reference Manual (for Perl version 5.12.1)
by Larry Wall and others
Paperback (6"x9"), 724 pages
Retail Price: $39.95 (pound 29.95 in UK)
ISBN-13: 978-1-906966-02-7
https://dl.acm.org/doi/book/10.5555/1893028

Perl Pocket Reference, 5th Edition
By Johan Vromans
Publisher: O'Reilly Media
Released: July 2011
Pages: 102
http://shop.oreilly.com/product/0636920018476.do

Programming Perl, 4th Edition
By: Tom Christiansen, brian d foy, Larry Wall, Jon Orwant
Publisher: O'Reilly Media
Formats: Print, Ebook, Safari Books Online
Print: January 2012
Ebook: March 2012
Pages: 1130
Print ISBN: 978-0-596-00492-7 | ISBN 10: 0-596-00492-3
Ebook ISBN: 978-1-4493-9890-3 | ISBN 10: 1-4493-9890-1
http://shop.oreilly.com/product/9780596004927.do

Perl Cookbook
By Tom Christiansen, Nathan Torkington
August 1998
Pages: 800
ISBN 10: 1-56592-243-3 | ISBN 13: 978-1-56592-243-3
http://shop.oreilly.com/product/9781565922433.do

Perl Cookbook, Second Edition
By Tom Christiansen, Nathan Torkington
Second Edition  August 2003
Pages: 964
ISBN 10: 0-596-00313-7 | ISBN 13: 9780596003135
http://shop.oreilly.com/product/9780596003135.do

Perl in a Nutshell, Second Edition
By Stephen Spainhour, Ellen Siever, Nathan Patwardhan
Second Edition  June 2002
Pages: 760
Series: In a Nutshell
ISBN 10: 0-596-00241-6 | ISBN 13: 9780596002411
http://shop.oreilly.com/product/9780596002411.do

Learning Perl on Win32 Systems
By Randal L. Schwartz, Erik Olson, Tom Christiansen
August 1997
Pages: 306
ISBN 10: 1-56592-324-3 | ISBN 13: 9781565923249
http://shop.oreilly.com/product/9781565923249.do

Learning Perl, Fifth Edition
By Randal L. Schwartz, Tom Phoenix, brian d foy
June 2008
Pages: 352
Print ISBN:978-0-596-52010-6 | ISBN 10: 0-596-52010-7
Ebook ISBN:978-0-596-10316-3 | ISBN 10: 0-596-10316-6
http://shop.oreilly.com/product/9780596520113.do

Learning Perl, 6th Edition
By Randal L. Schwartz, brian d foy, Tom Phoenix
June 2011
Pages: 390
ISBN-10: 1449303587 | ISBN-13: 978-1449303587
http://shop.oreilly.com/product/0636920018452.do

Advanced Perl Programming, 2nd Edition
By Simon Cozens
June 2005
Pages: 300
ISBN-10: 0-596-00456-7 | ISBN-13: 978-0-596-00456-9
http://shop.oreilly.com/product/9780596004569.do

Perl RESOURCE KIT UNIX EDITION
Futato, Irving, Jepson, Patwardhan, Siever
ISBN 10: 1-56592-370-7
http://shop.oreilly.com/product/9781565923706.do

Perl Resource Kit -- Win32 Edition
Erik Olson, Brian Jepson, David Futato, Dick Hardt
ISBN 10:1-56592-409-6
http://shop.oreilly.com/product/9781565924093.do

Announcing Perl 7
Jun 24, 2020 by brian d foy
https://www.perl.com/article/announcing-perl-7/

Understanding Japanese Information Processing
By Ken Lunde
O'Reilly Media
September 1993
Pages: 470
ISBN: 978-1-56592-043-9 | ISBN 10:1-56592-043-0
http://shop.oreilly.com/product/9781565920439.do

CJKV Information Processing Chinese, Japanese, Korean & Vietnamese Computing
By Ken Lunde
O'Reilly Media
Print: January 1999
Ebook: June 2009
Pages: 1128
Print ISBN:978-1-56592-224-2 | ISBN 10:1-56592-224-7
Ebook ISBN:978-0-596-55969-4 | ISBN 10:0-596-55969-0
http://shop.oreilly.com/product/9781565922242.do

CJKV Information Processing, 2nd Edition
By Ken Lunde
O'Reilly Media
Print: December 2008
Ebook: June 2009
Pages: 912
Print ISBN: 978-0-596-51447-1 | ISBN 10:0-596-51447-6
Ebook ISBN: 978-0-596-15782-1 | ISBN 10:0-596-15782-7
http://shop.oreilly.com/product/9780596514471.do

DB2 GIJUTSU ZENSHO
By BM Japan Systems Engineering Co.,Ltd. and IBM Japan, Ltd.
2004/05
Pages: 887
ISBN-10: 4756144659 | ISBN-13: 978-4756144652
https://iss.ndl.go.jp/books/R100000002-I000007400836-00

Mastering Regular Expressions, Second Edition
By Jeffrey E. F. Friedl
Second Edition  July 2002
Pages: 484
ISBN 10: 0-596-00289-0 | ISBN 13: 9780596002893
http://shop.oreilly.com/product/9780596002893.do

Mastering Regular Expressions, Third Edition
By Jeffrey E. F. Friedl
Third Edition  August 2006
Pages: 542
ISBN 10: 0-596-52812-4 | ISBN 13:9780596528126
http://shop.oreilly.com/product/9780596528126.do

Regular Expressions Cookbook
By Jan Goyvaerts, Steven Levithan
May 2009
Pages: 512
ISBN 10:0-596-52068-9 | ISBN 13: 978-0-596-52068-7
http://shop.oreilly.com/product/9780596520694.do

Regular Expressions Cookbook, 2nd Edition
By Steven Levithan, Jan Goyvaerts
Released August 2012
Pages: 612
ISBN: 9781449327453
https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/

JIS KANJI JITEN
Kouji Shibano
Pages: 1456
ISBN 4-542-20129-5
http://www.webstore.jsa.or.jp/lib/lib.asp?fn=/manual/mnl01_12.htm

UNIX MAGAZINE
1993 Aug
Pages: 172
T1008901080816 ZASSHI 08901-8
http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml

Shell Script Magazine vol.41
2016 September
Pages: 64
https://shell-mag.com/

LINUX NIHONGO KANKYO
By YAMAGATA Hiroo, Stephen J. Turnbull, Craig Oda, Robert J. Bickel
June, 2000
Pages: 376
ISBN 4-87311-016-5
https://www.oreilly.co.jp/books/4873110165/

MacPerl Power and Ease
By Vicki Brown, Chris Nandor
April 1998
Pages: 350
ISBN 10: 1881957322 | ISBN 13: 978-1881957324
http://www.amazon.com/Macperl-Power-Ease-Vicki-Brown/dp/1881957322

Other Tools
https://metacpan.org/dist/Jacode
https://metacpan.org/dist/Jacode4e
https://metacpan.org/dist/Jacode4e-RoundTrip
https://metacpan.org/dist/Perl7-Handy
https://metacpan.org/dist/UTF8-R2
https://metacpan.org/dist/mb

BackPAN
http://backpan.perl.org/authors/id/I/IN/INA/

Recent Perl packages by "INABA Hitoshi"
http://code.activestate.com/ppm/author:INABA-Hitoshi/

ACKNOWLEDGEMENTS

This software was made referring to software and the document that the following hackers or persons had made. I am thankful to all persons.

Larry Wall, Perl
http://www.perl.org/

Jesse Vincent, Compatibility is a virtue
https://www.nntp.perl.org/group/perl.perl5.porters/2010/05/msg159825.html

Kazumasa Utashiro, jcode.pl: Perl library for Japanese character code conversion, Kazumasa Utashiro
https://metacpan.org/author/UTASHIRO
ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/
ftp://ftp.oreilly.co.jp/pcjp98/utashiro/
http://mail.pm.org/pipermail/tokyo-pm/2002-March/001319.html
https://twitter.com/uta46/status/11578906320

mikeneko creator club, Private manual of jcode.pl
http://mikeneko.creator.club.ne.jp/~lab/kcode/jcode.html

gama, getcode.pl
http://www2d.biglobe.ne.jp/~gama/cgi/jcode/jcode.htm

Gappai, jcodeg.diff
http://www.vector.co.jp/soft/win95/prog/se347514.html

OHZAKI Hiroki, Perl memo
http://www.din.or.jp/~ohzaki/perl.htm#JP_Code

NAKATA Yoshinori, Ad hoc patch for reduce waring on h2z_euc
http://white.niu.ne.jp/yapw/yapw.cgi/jcode.pl%A4%CE%A5%A8%A5%E9%A1%BC%CD%DE%C0%A9

Dan Kogai, Jcode module and Encode module
https://metacpan.org/release/Encode
https://metacpan.org/release/Jcode
http://blog.livedoor.jp/dankogai/archives/50116398.html
http://blog.livedoor.jp/dankogai/archives/51004472.html

Donzoko CGI+--, Jcode like Encode Wrapper
http://www.donzoko.net/cgi/jencode/

Yusuke Kawasaki, Encode561 module
http://www.kawa.net/works/perl/i18n-emoji/i18n-emoji.html#Encode561

Tokyo-pm archive
http://mail.pm.org/pipermail/tokyo-pm/

utf8_possible_story, Perl de Nihongo Aruaru
http://aizen.likk.jp/slide/utf8_possible_story/

Very old fj.kanji discussion
http://www.ie.u-ryukyu.ac.jp/~kono/fj/fj.kanji/index.html

TechLION vol.26
https://type.jp/et/feature/1569

Kaoru Maeda, Perl's history Perl 1,2,3,4
https://www.slideshare.net/KaoruMaeda/perl-perl-1234

nurse, What is "string"
https://naruse.hateblo.jp/entries/2014/11/07#1415355181

NISHIO Hirokazu, What's meant "string as a sequence of characters"?
https://nishiohirokazu.hatenadiary.org/entry/20141107/1415286729

Rick Yamashita, Shift_JIS
https://shino.tumblr.com/post/116166805/%E5%B1%B1%E4%B8%8B%E8%89%AF%E8%94%B5%E3%81%A8%E7%94%B3%E3%81%97%E3%81%BE%E3%81%99-%E7%A7%81%E3%81%AF1981%E5%B9%B4%E5%BD%93%E6%99%82us%E3%81%AE%E3%83%9E%E3%82%A4%E3%82%AF%E3%83%AD%E3%82%BD%E3%83%95%E3%83%88%E3%81%A7%E3%82%B7%E3%83%95%E3%83%88jis%E3%81%AE%E3%83%87%E3%82%B6%E3%82%A4%E3%83%B3%E3%82%92%E6%8B%85%E5%BD%93
http://www.wdic.org/w/WDIC/%E3%82%B7%E3%83%95%E3%83%88JIS

nurse, History of Japanese EUC 22:00
https://naruse.hateblo.jp/entries/2009/03/08

Ricardo Signes, Perl 5.14 for Pragmatists
https://www.slideshare.net/rjbs/perl-514-8809465

Ricardo Signes, What's New in Perl? v5.10 - v5.16 #'
https://www.slideshare.net/rjbs/whats-new-in-perl-v510-v516

Causes and countermeasures for garbled Japanese characters in perl
https://prozorec.hatenablog.com/entry/2018/03/19/080000

Impressions of talking of Larry Wall at LL Future
https://hnw.hatenablog.com/entry/20080903

About Windows and Japanese text
https://blogs.windows.com/japan/2020/02/20/about-windows-and-japanese-text/

About Windows diagnostic data
https://blogs.windows.com/japan/2019/12/05/about-windows-diagnostic-data/