NAME

Locale::ID::GuessGender::FromFirstName - Guess gender of an Indonesian first name

VERSION

version 0.02

SYNOPSIS

use Locale::ID::GuessGender::FromFirstName qw/guess_gender/;

my @res = guess_gender("Budi"); # ({ name=>"budi", result=>"M",
                                #   guess_confidence=>1,
                                #   gender_ratio => 1, algo=>"common" })

# specify more detailed options, guess several names at once
my @res = guess_gender({min_guess_confidence => 0.75,
                        algos => [qw/common v1_rules google/]},
                        "amita", "mega");

DESCRIPTION

This module provides a function to guess the gender of commonly encountered people's names in Indonesia, using several algorithms.

FUNCTIONS

guess_gender([OPTS, ]FIRSTNAME...) => (RES, ...)

Guess the gender of given first name(s). An optional hashref OPTS can be given as the first argument. Valid pair for OPTS:

algos => [ALGO, ...]

Set the algorithms to use, in that order. Default is [qw/common v1_rules/]. Known algorithms: common (try to match from the list of common names), v1_rules (use some simple heuristics), google (compare the number of Google search results for "bapak FIRSTNAME" vs "ibu FIRSTNAME").

The choice of algorithms can severely impact the result. For example, "Mega" is actually pretty ambivalent, used by both females and males. But Google search for "ibu mega" will return much more results than "bapak mega", thus the google algorithm will decide that "Mega" is predominantly female.

min_guess_confidence => FRACTION

Minimum guess confidence level to accept an algorithm's guess as the final answer, a number between 0 and 1. Default is 0.51 (51%).

try_all => BOOL

Whether to try all algorithms specified in algorithms. Default is 0, which means stop trying after an algorithm succeeds to generate guess with specified minimum guess confidence. If set to 1, all algorithms will be tried and the best result used.

algo_opts => {ALGO => OPTS, ...}

Specify per-algorithm options. See the algorithm's documentation for known options.

Will return a result hashref RES for each given input. Known pair of RES:

result => "M" or "F" or "both" OR "neither" OR undef

The final guess result. undef if no algorithm succeeded. "M" if name is predominantly male. "F" if name is predominantly female. "both" if name is ambivalent. "neither" if sufficiently confident that the name is not a person's name.

guess_confidence => FRACTION

The final guess confidence level.

gender_ratio => FRACTION

Estimation of gender ratio. 1 (100%) means the name is always used for male or female. 0.9 (90%) means sometimes (about 10% of the time) the name is also used for the opposite sex. If the gender ratio is close to 0.5 it means the name is ambivalent and often used equally by males and females.

algo => FRACTION

The algo that is used to get the final result.

algo_res => [RES, ...]

Per-algorithm result. Usually only useful for debugging.

BUGS/TODOS

This is a preliminary release. List of common names is not very complete. Heuristic rules are still too simplistic. Expect the accuracy of this module to improve in subsequent releases.

SEE ALSO

Locale::ID::ParseName::Person

AUTHOR

Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.