NAME
Geo::Address::Parser::Country - Resolve a place string component to a canonical country name
VERSION
Version 0.02
SYNOPSIS
use Geo::Address::Parser::Country;
use Locale::US;
use Locale::CA;
use Locale::AU;
my $resolver = Geo::Address::Parser::Country->new({
us => Locale::US->new(),
ca_en => Locale::CA->new(lang => 'en'),
ca_fr => Locale::CA->new(lang => 'fr'),
au => Locale::AU->new(),
});
my $result = $resolver->resolve(
component => 'England',
place => 'Ramsgate, Kent, England',
);
# $result->{country} eq 'United Kingdom'
# $result->{place} eq 'Ramsgate, Kent, England'
# $result->{warnings} is []
# $result->{unknown} is 0
DESCRIPTION
Resolves the last comma-separated component of a place string into a canonical country name. Handles common variants, abbreviations, and historical names found in genealogy data and other poorly-normalised address sources.
Designed specifically to tolerate poor-quality data from software imports where place strings may be inconsistent, abbreviated, or use historical country names no longer in common use.
Resolution proceeds through the following steps in order:
- 1. Direct lookup table (covers historical names, abbreviations, common variants)
- 2. US state code or name via Locale::US
- 3. Canadian province code or name via Locale::CA (English and French)
- 4. Australian state code or name via Locale::AU
- 5. Locale::Object::Country by name
- 6. Geo::GeoNames search (optional, only if object provided at construction)
- 7. Unknown - returns with
unknown => 1
TODO
Add
normalise_place()to handle missing commas before country and state names in raw uncleaned input strings. Poor data import means strings like"Houston TX USA"or"Some Place England"need comma insertion before component extraction can work correctly. This should be implemented before relying onresolve()for raw uncleaned input.
METHODS
new
Purpose
Constructs a new resolver object. The locale objects are used for state and province lookups and are retained for the lifetime of the object.
API Specification
Input
{
us => { type => 'object', can => 'new' }, # Locale::US instance
ca_en => { type => 'object', can => 'new' }, # Locale::CA English instance
ca_fr => { type => 'object', can => 'new' }, # Locale::CA French instance
au => { type => 'object', can => 'new' }, # Locale::AU instance
geonames => { # Optional Geo::GeoNames instance
type => 'object',
can => 'search',
optional => 1,
},
}
Output
{ type => 'object', isa => 'Geo::Address::Parser::Country' }
Arguments
us- A Locale::US instance. Required.ca_en- A Locale::CA instance withlang => 'en'. Required.ca_fr- A Locale::CA instance withlang => 'fr'. Required.au- A Locale::AU instance. Required.geonames- An optional Geo::GeoNames instance used as a last-resort fallback when all other resolution methods fail.
Returns
A blessed Geo::Address::Parser::Country object.
Side Effects
None.
Notes
The locale objects are stored by reference and shared for all calls to resolve(). Constructing them once and reusing the resolver object is more efficient than constructing a new resolver for each lookup.
Example
my $resolver = Geo::Address::Parser::Country->new({
us => Locale::US->new(),
ca_en => Locale::CA->new(lang => 'en'),
ca_fr => Locale::CA->new(lang => 'fr'),
au => Locale::AU->new(),
});
resolve
Purpose
Resolves the last comma-separated component of a place string to a canonical country name, and returns the (possibly modified) place string alongside any warnings generated during resolution.
API Specification
Input
{
component => { type => 'string', min => 1 },
place => { type => 'string', min => 1 },
}
Output
{
type => 'hashref',
schema => {
country => { type => 'string', optional => 1 },
place => { type => 'string', min => 1 },
warnings => { type => 'arrayref' },
unknown => { type => 'boolean' },
},
}
Arguments
component- The last comma-separated component of the place string, e.g."England","TX","NSW". Required.place- The full place string, e.g."Ramsgate, Kent, England". May be modified by appending a country suffix where needed. Required.
Returns
A hashref containing:
country- The canonical country name as a string, e.g."United Kingdom".undefif resolution failed.place- The full place string, possibly with a country suffix appended (e.g.", USA"). Always returned even if unmodified.warnings- An arrayref of warning strings generated during resolution. May be empty. The caller is responsible for acting on these, e.g. by passing them to acomplain()function.unknown- A boolean. True if the country could not be resolved by any method.
Side Effects
None. All warnings are returned to the caller rather than emitted directly.
Notes
Resolution order is: direct lookup, US state, Canadian province, Australian state, Locale::Object::Country, GeoNames (if available). The first successful match wins.
When a US state, Canadian province, or Australian state is recognised, the appropriate country string (", USA", ", Canada", ", Australia") is appended to place if not already present.
Example
my $result = $resolver->resolve(
component => 'TX',
place => 'Houston, TX',
);
# $result->{country} eq 'United States'
# $result->{place} eq 'Houston, TX, USA'
# $result->{warnings}[0] eq 'TX: assuming country is United States'
# $result->{unknown} is 0
AUTHOR
Nigel Horne <njh@nigelhorne.com>
REPOSITORY
https://github.com/nigelhorne/Geo-Address-Parser-Country
SUPPORT
This module is provided as-is without any warranty.
Please report any bugs or feature requests to bug-geo-address-parser at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Geo-Address-Parser-Country. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
BUGS
The direct lookup table contains
nlas an abbreviation for the Netherlands. This conflicts withNL, the ISO 3166-2 code for the Canadian province of Newfoundland and Labrador. Because the direct table is consulted before theLocale::CAprovince-code path, passingcomponent => 'NL'currently resolves toNetherlandsrather thanCanada. The workaround is to pass the full province name (Newfoundland and Labrador) or to ensure the place string includes an explicitCanadasuffix before callingresolve().Geo::GeoNamesgenerates its query methods viaAUTOLOAD, socan('search')returns false at the Perl level even though$geonames->search(...)works correctly at runtime. The constructor schema currently validates the optionalgeonamesargument withcan => 'search', which rejects a realGeo::GeoNamesobject. Until this is resolved, pass a wrapper object that definessearchas a named method, or subclassGeo::GeoNamesand add a stub:package My::GeoNames; use parent 'Geo::GeoNames'; sub search { my $self = shift; $self->SUPER::search(@_) }
Please report additional bugs via the GitHub issue tracker: https://github.com/nigelhorne/Geo-Address-Parser-Country/issues
SEE ALSO
LICENCE AND COPYRIGHT
Copyright 2026 Nigel Horne.
Usage is subject to licence terms.
The licence terms of this software are as follows:
Personal single user, single computer use: GPL2
All other users (including Commercial, Charity, Educational, Government) must apply in writing for a licence for use from Nigel Horne at the above e-mail.