NAME
Geo::Postcodes::JP::Process - process Japan Post Office postcode data
read_ken_all
my $postcodes_ref = read_ken_all ('KEN_ALL.CSV');
Read the file KEN_ALL.CSV. The return value is an array reference containing the lines of the postcode file in the same order as the file itself. The routine issues a fatal error if a problem is encountered.
The return value is a double indexed array.
process_line
my %values = process_line ($line);
Turn a line of the postcode file into a hash of its values.
The values of the hash are
- number
-
The JIS code number for the region. The JIS standards for regions of Japan are numbered JIS X 0401 (1973) for the prefecture identification codes, and JIS X0402 (2003) identification codes for cities, towns and villages.
- old_postcode
-
The old three or five digit postcode.
- new_postcode
-
The new seven digit postcode.
- ken_kana
-
The kana version of the prefecture.
- city_kana
-
The kana version of the city.
- address_kana
-
The kana version of the address.
- ken_kanji
-
The kanji version of the prefecture.
- city_kanji
-
The kanji version of the city.
- address_kanji
-
The kanji version of the address.
- one-region-multiple-postcodes
-
This is 1 if the same address has more than one postcode, zero otherwise.
- numbering-start
-
Indicates if numbering starts, 1 if so.
- has-choume
-
Indicates there is a division into "choume".
- one-postcode-multiple-regions
-
This is 1 if the same postcode covers more than one region, zero otherwise.
- koushin-no-hyouji
-
0 = no change, 1 = change, 2 = delete
- henkou-riyuu
-
Reason for change.
See also the Japan Post explanation of the KEN_ALL.CSV file in Japanese.
concatenate_multi_line
$postcodes = concatenate_multi_line ($postcodes, $duplicates);
Concatenate a single entry which is spread on multiple lines. $Duplicates
is the return value of find_duplicates.
If you are wondering what "concatenate a single entry which is spread on multiple lines" means, some of the entries in the CSV file are actually single entries but broken into two or more lines if the number of characters in one of the fields exceeds a maximum. This routine attempts to put this broken data back together again.
At the moment there is no comprehensive check of correctness of the result.
find_duplicates
my $duplicates = find_duplicates ($postcodes);
Make a hash whose keys are postcodes which have duplicate references, and whose values are array references to arrays of offsets in the postcode file. The return value is the hash reference.
read_jigyosyo
my $jigyosyo_data = read_jigyosyo ('/path/to/jigyosyo/csv/file');
process_jigyosyo_line
my %values = process_jigyosyo_line ($line);
Turn the array reference $line
into a hash of its values using the fields.
The values of the hash are
- number
-
As for the main postcode file.
- kana
-
The name of the place of business in kana.
- kanji
-
The name of the place of business in kanji.
- ken_kanji
-
The kanji version of the prefecture name.
- city_kanji
-
The kanji version of the city name.
- address_kanji
-
The kanji version of the address name.
- street_number
-
The exact street number of the place of business.
- new_postcode
-
As for the "ken_all" fields.
- old_postcode
-
As for the "ken_all" fields.
- post-office
-
The post office which handles mail for this postcode.
- type
-
0=Large company 1=Private
- multiple-postcode
-
0=Not multiple, also 1,2,3.
- Alteration code
-
0=No change 1=New addition 2=Deleted
See also the Japan Post explanation of the JIGYOSYO.CSV file in Japanese.
remove_bad_addresses
$postcodes = remove_bad_addresses ($postcodes);
improve_postcodes
$postcodes = improve_postcodes ($postcodes);
Improve the postcodes as much as possible by unifying lines etc.
TERMINOLOGY
- Postcode
-
In this module, "postcode" is the translation used for the Japanese term "yuubin bangou" (郵便番号). They might be called "postal codes" or even "zip codes" by some.
This module only deals with the seven-digit modern postcodes introduced in 1998. It does not handle the three and five digit postcodes which were used until 1998.
- Ken
-
In this module, "ken" in a variable name means the Japanese system of prefectures, which includes the "ken" divisions as well as the "do/fu/to" divisions, with "do" used for Hokkaido, "fu" for Osaka and Kyoto, and "to" for the Tokyo metropolis. These are got from the module using the word "ken".
- City
-
In this module, "city" is the term used to point to the second field in the postcode data file. Some of these are actually cities, like "Mito-shi" (水戸市), the city of Mito in Ibaraki prefecture. However, some of them are not really cities but other geographical subdivisions, such as gun/machi or shi/ku combinations.
- Address
-
In this module, "address" is the term used to point to the third field in the postcode data file. This is called 町域 (chouiki) by the Post Office.
For example, in the following data file entry, "3100004" is the postcode, "茨城県" (Ibaraki-ken) is the "ken", "水戸市" (Mito-shi) is the "city", and "青柳町" (Aoyagicho) is the "address".
08201,"310 ","3100004","イバラキケン","ミトシ","アオヤギチョウ","茨城県","水戸市","青柳町",0,0,0,0,0,0
- Jigyosyo
-
In this module, "jigyosyo" is the term used to point to places of business. Some places of business have their own postcodes.
The term "jigyosyo" is used because it is the post office's own romanization, but this is actually an error and should be either jigyōsho or zigyôsyo in standard romanizations of Japanese, or jigyosho in simplified Hepburn. See the Sci.Lang.Japan FAQ page on Japanese romanization.
- Street number
-
In this module "street number" is an arbitrary way of describing the final part of the address, which may actually specify a variety of things, such as the ban-chi, or even what floor of a building the postcode refers to.
The street number field is mostly relevant for the jigyosyo postcodes, but also crops up in some of the addresses, especially for rural areas.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT AND LICENSE
Geo::Postcodes::JP and associated files are copyright (c) 2012 Ben Bullock.
You may use, copy, modify and distribute Geo::Postcodes::JP under the same terms as the Perl programming language itself.