NAME
Geo::Address::Parser::Country - Resolve a place string component to a canonical country name
VERSION
Version 0.04
SYNOPSIS
use Geo::Address::Parser::Country;
use Locale::US;
use Locale::CA;
use Locale::AU;
my $resolver = Geo::Address::Parser::Country->new({
us => Locale::US->new(),
ca_en => Locale::CA->new(lang => 'en'),
ca_fr => Locale::CA->new(lang => 'fr'),
au => Locale::AU->new(),
});
# Simple form: component extracted automatically from place
my $result = $resolver->resolve(
place => 'Ramsgate, Kent, England',
);
# Explicit form: caller supplies the component directly
my $result = $resolver->resolve(
component => 'England',
place => 'Ramsgate, Kent, England',
);
# $result->{country} eq 'United Kingdom'
# $result->{place} eq 'Ramsgate, Kent, England'
# $result->{warnings} is []
# $result->{unknown} is 0
DESCRIPTION
Resolves the last comma-separated component of a place string into a canonical country name. Handles common variants, abbreviations, and historical names found in genealogy data and other poorly-normalised address sources.
Designed specifically to tolerate poor-quality data from software imports where place strings may be inconsistent, abbreviated, or use historical country names no longer in common use.
Resolution proceeds through the following steps in order:
-
- Direct lookup table (covers historical names, abbreviations, common variants)
-
- US state code or name via Locale::US
-
- Canadian province code or name via Locale::CA (English and French)
-
- Australian state code or name via Locale::AU
-
- Locale::Object::Country by name
-
- Geo::GeoNames search (optional, only if object provided at construction)
-
- Unknown - returns with
unknown => 1
- Unknown - returns with
TODO
- Complete
normalise_place()to handle missing commas before country and state names in raw uncleaned input strings. Poor data import means strings like"Houston TX USA"or"Some Place England"need comma insertion before component extraction can work correctly. This should be called beforeresolve()for raw uncleaned input.
METHODS
new
Purpose
Constructs a new resolver object. The locale objects are used for state and province lookups and are retained for the lifetime of the object.
API Specification
Input
{
us => { type => 'object' }, # Locale::US instance
ca_en => { type => 'object' }, # Locale::CA English instance
ca_fr => { type => 'object' }, # Locale::CA French instance
au => { type => 'object' }, # Locale::AU instance
geonames => { # Optional Geo::GeoNames instance
type => 'object',
optional => 1,
},
}
Output
{ type => 'object', isa => 'Geo::Address::Parser::Country' }
Arguments
us- A Locale::US instance. Required.ca_en- A Locale::CA instance withlang => 'en'. Required.ca_fr- A Locale::CA instance withlang => 'fr'. Required.au- A Locale::AU instance. Required.geonames- An optional Geo::GeoNames instance used as a last-resort fallback when all other resolution methods fail.
Returns
A blessed Geo::Address::Parser::Country object.
Side Effects
None.
Notes
The locale objects are stored by reference and shared for all calls to
resolve(). Constructing them once and reusing the resolver object
is more efficient than constructing a new resolver for each lookup.
Object::Configure is used after validation to allow locale objects
to be supplied via environment variables or a config file rather than
always being passed explicitly.
Example
my $resolver = Geo::Address::Parser::Country->new({
us => Locale::US->new(),
ca_en => Locale::CA->new(lang => 'en'),
ca_fr => Locale::CA->new(lang => 'fr'),
au => Locale::AU->new(),
});
resolve
Purpose
Resolves the last comma-separated component of a place string to a canonical country name, and returns the (possibly modified) place string alongside any warnings generated during resolution.
API Specification
Input
{
place => { type => 'string', min => 1 }, # required
component => { type => 'string', min => 1, optional => 1 },
}
Output
{
type => 'hashref',
schema => {
country => { type => 'string', optional => 1 },
place => { type => 'string', min => 1 },
warnings => { type => 'arrayref' },
unknown => { type => 'boolean' },
},
}
Arguments
place- The full place string, e.g."Ramsgate, Kent, England". Required. May be modified by appending a country suffix where needed.component- The last comma-separated component of the place string, e.g."England","TX","NSW". Optional. When absent,resolve()extracts it automatically as the last comma-separated token ofplace. Whenplacecontains no comma, the entireplacestring is used as the component. Supplyingcomponentexplicitly is useful when the caller already has it available from a structured data source.
Returns
A hashref containing:
country- The canonical country name as a string, e.g."United Kingdom".undefif resolution failed.place- The full place string, possibly with a country suffix appended (e.g.", USA"). Always returned even if unmodified.warnings- An arrayref of warning strings generated during resolution. May be empty. The caller is responsible for acting on these, e.g. by passing them to acomplain()function.unknown- A boolean. True if the country could not be resolved by any method.
Side Effects
None. All warnings are returned to the caller rather than emitted directly.
Notes
Resolution order is: direct lookup, US state, Canadian province, Australian state, Locale::Object::Country, GeoNames (if available). The first successful match wins.
When a US state, Canadian province, or Australian state is recognised,
the appropriate country string (", USA", ", Canada",
", Australia") is appended to place if not already present.
Example
# Simple form - component extracted automatically
my $result = $resolver->resolve(
place => 'Houston, TX',
);
# Explicit form - component supplied by caller
my $result = $resolver->resolve(
component => 'TX',
place => 'Houston, TX',
);
# $result->{country} eq 'United States'
# $result->{place} eq 'Houston, TX, USA'
# $result->{warnings}[0] eq 'TX: assuming country is United States'
# $result->{unknown} is 0
normalise_place
Purpose
Inserts missing commas into a raw, uncleaned place string so that
resolve() can reliably extract the last component. Raw input from
poor-quality data imports frequently omits the commas that separate
city, state, and country tokens.
API Specification
Input
{
place => { type => 'string', min => 1 },
}
Output
{
type => 'hashref',
schema => {
place => { type => 'string', min => 1 },
warnings => { type => 'arrayref' },
},
}
Arguments
place- The raw place string to normalise, e.g."Houston TX USA"or"Some Place England". Required.
Returns
A hashref containing:
place- The normalised place string with commas inserted where they were missing, e.g."Houston, TX, USA". Always returned even if no changes were made.warnings- An arrayref of warning strings generated during normalisation, e.g. noting where commas were inserted. May be empty.
Side Effects
None.
Notes
This method is not yet fully implemented. It currently returns the place string unchanged. Implementation requires scanning the token sequence against the locale tables (US states, Canadian provinces, Australian states, and the %DIRECT country table) to identify where comma boundaries belong.
Call this method before resolve() when working with raw input that
may lack commas:
my $norm = $resolver->normalise_place(place => 'Houston TX USA');
my $result = $resolver->resolve(place => $norm->{place});
Example
my $norm = $resolver->normalise_place(place => 'Some Place England');
# $norm->{place} eq 'Some Place, England' (once implemented)
# $norm->{warnings} contains a note about comma insertion
AUTHOR
Nigel Horne <njh@nigelhorne.com>
REPOSITORY
https://github.com/nigelhorne/Geo-Address-Parser-Country
SUPPORT
This module is provided as-is without any warranty.
Please report any bugs or feature requests to bug-geo-address-parser at rt.cpan.org,
or through the web interface at
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Geo-Address-Parser-Country.
I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
BUGS
normalise_place()is not yet implemented. It currently returns the place string unchanged. See "normalise_place" for details of the planned behaviour.- The step 6 Australian state code lookup uses the raw,
un-normalised component as the hash key, making it case-sensitive
unlike steps 2-5. Lowercase codes such as
nswwill not match. A fix to applyuc($component)consistently is pending. Geo::GeoNamesgenerates its query methods viaAUTOLOAD, socan('search')returns false at the Perl level even though$geonames->search(...)works correctly at runtime. Thecan => 'search'schema check has been commented out as a temporary workaround pending a fix toGeo::GeoNamesitself.
Please report additional bugs via the GitHub issue tracker: https://github.com/nigelhorne/Geo-Address-Parser-Country/issues
SEE ALSO
- Test Dashboard
- Geo::Address::Parser
- Locale::US
- Locale::CA
- Locale::AU
- Locale::Object::Country
- Geo::GeoNames
- Object::Configure
- Params::Get
- Params::Validate::Strict
- Return::Set
LICENCE AND COPYRIGHT
Copyright 2026 Nigel Horne.
Usage is subject to GPL2 licence terms. If you use it, please let me know.