NAME
Lingua::EN::TitleParse - Parse titles in people's names
SYNOPSIS
use Lingua::EN::TitleParse;
# Functional interface
my ($title, $name) = Lingua::EN::TitleParse->parse("Mr Joe Bloggs");
# $title = "Mr", $name = "Joe Bloggs"
# OO interface
$title_obj = Lingua::EN::TitleParse->new();
($title, $name) = $title_obj->parse("Mr Joe Bloggs");
# $title = "Mr", $name = "Joe Bloggs"
# Use your own titles with the OO interface
#
@titles = ('Master', 'International Master', 'Grandmaster');
$title_obj = Lingua::EN::TitleParse->new( titles => \@titles );
($title, $name) = $title_obj->parse("Grandmaster Joe Bloggs");
# $title = "Grandmaster", $name = "Joe Bloggs"
# Retrieve the list of titles
@titles = $title_obj->titles;
# Optionally get cleaned titles on output
$title_obj = Lingua::EN::TitleParse->new( clean => 1 );
($title, $name) = $title_obj->parse("mR. Joe Bloggs");
# $title = "Mr", $name = "Joe Bloggs"
# Without 'clean' turned on
$title_obj = Lingua::EN::TitleParse->new();
($title, $name) = $title_obj->parse("mR. Joe Bloggs");
# $title = "mR.", $name = "Joe Bloggs"
DESCRIPTION
This module parses strings containing people's names to identify titles, like "Mr", "Mrs", etc, so the names and titles can be separated.
e.g. "Mr Joe Bloggs" will be parsed to "Mr", and "Joe Bloggs".
The module handles "fuzziness" such as changes of case and punctuation characters: "Mr", "MR", "Mr.", and "mr" will all be recognised correctly.
It differs from another CPAN module, Lingua::EN::NameParse, in two key respects:
Firstly, Lingua::EN::TitleParse performs well irrespective of the number of titles being matched against. While Lingua::EN::NameParse loops through a series of regular expressions, and suffers when the set of titles being matched is long, Lingua::EN::TitleParse uses hash-lookups after "normalising" each name string, providing consistently good performance.
Secondly it's only focused on parsing titles in names, whereas Lingua::EN::NameParse attempts much more. However the extra intelligence of Lingua::EN::NameParse can come at the cost of predictablity. Lingua::EN::TitleParse is more conservative, and by default makes no changes to the case or content (with the exception of compressing extra white-space) of what was input, effectively only splitting the input string in two. (But that said, there is an option to output cleaned titles).
We're using the same titles Lingua::EN::NameParse uses (their "extended set") with minor additions, but your own set of titles can be imported instead.
METHODS
- parse
-
This method identifies a title in a name and splits the name out into the title and the rest of the name.
# e.g. via the functional interface my ($title, $name) = Lingua::EN::TitleParse->parse("Mr Joe Bloggs"); # e.g. via the Object-Oriented interface $title_obj = Lingua::EN::TitleParse->new(); ($title, $name) = $title_obj->parse("Mr Joe Bloggs");
- titles
-
This method returns an array of the titles in use. This will either be the default titles, or custom titles input during construction.
# e.g. via the functional interface @titles = Lingua::EN::TitleParse->titles; # e.g. via the Object-Oriented interface $title_obj = Lingua::EN::TitleParse->new( titles => \@custom_titles ); @titles = $title_obj->titles;
EXPORT
None.
SEE ALSO
Lingua::EN::NameParse
AUTHOR
Philip Abrahamson, <PhilipAbrahamson@venda.com>
COPYRIGHT AND LICENSE
Copyright (C) 2013 by Venda Ltd
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.4 or, at your option, any later version of Perl 5 you may have available.