NAME

Bio::Taxonomy::GlobalNames - Perlish OO bindings to the Global Names Resolver API

VERSION

Version 0.03

SYNOPSIS

    use Bio::Taxonomy::GlobalNames;

    # Provide the input data and parameters.
    my $query = Bio::Taxonomy::GlobalNames->new(
        names           => $names,
        data_source_ids => $data_source_ids,
        resolve_once    => $resolve_once,
    );

    my $output = $query->post();    # Perform a POST request and return the output.

    # Go through the Output object.
    my @data = @{ $output->data };

    foreach my $datum (@data)
    {
	
        # Check if a non-empty Results arrayref was returned.
        if ( my @results = @{ $datum->results } )
        {

            # Parse the Results objects.
            foreach my $result (@results)
            {

                # Retrieve the canonical name and score for each result.
                my $canonical_name = $result->canonical_form;
                my $score          = $result->score;
            }
        }
    }

DESCRIPTION

Bio::Taxonomy::GlobalNames provides Perl objects and functions that interface with the Global Names Resolver web service. Using a REST client, input is sent to the service, whereas results are internally converted from JSON format to nested objects and returned to the user.

This module can be used for automated standardisation of species names, according to a variety of sources that can be manually selected, if needed. See also the example script, provided with this module.

Attributes for Bio::Taxonomy::GlobalNames objects

data

A string with a list of names delimited by new lines. You may optionally supply your local id for each name as:

123|Parus major
125|Parus thruppi
126|Parus carpi

Names in the response will contain your supplied ids, facilitating integration.

The attributes 'data', 'file' and 'names' are mutually exclusive.

data_source_ids

A string with a pipe-delimited list of data sources. See the list of data sources.

file

A file in Unicode encoding with a list of names delimited by new lines, similar to the 'data' attribute. This attribute is valid only when the post method is used.

The attributes 'data', 'file' and 'names' are mutually exclusive.

names

A string with a list of names delimited by either pipe "|" or tab "\t". Use a pipe with the get method.

The attributes 'data', 'file' and 'names' are mutually exclusive.

resolve_once

A string with a boolean (true/false) value. Default: 'false'. Find the first available match instead of matches across all data sources with all possible renderings of a name.

When 'true', response is rapid but incomplete.

with_context

A string with a boolean (true/false) value. Default: 'true'. Reduce the likelihood of matches to taxonomic homonyms.

When 'true', a common taxonomic context is calculated for all supplied names from matches in data sources that have classification tree paths. Names out of determined context are penalized during score calculation.

Methods for Bio::Taxonomy::GlobalNames objects

get

Performs a GET request and returns an Output object.

post

Performs a POST request and returns an Output object. If you are supplying an input file, you have to use the 'post' method.

Attributes for Output objects

$output->context

A Context object, if 'with_context' parameter is set to true.

$output->data

An array reference of Data objects, containing query input(s) and results.

my @data = @{ $output->data };
$output->data_sources

An array reference of DataSources objects, whose ids you used for name resolution. If no data sources were given, the array reference is empty.

my @data_sources = @{ $output->data_sources };
$output->id

The resolver request id. Your request is stored temporarily in the remote database and is assigned an id.

$output->parameters

A Parameters object, containing the parameters of the query.

$output->status or $output->message

The final status of the request -- 'success' or 'failure'.

$output->status_message

The message associated with the status.

$output->url

The url at which you can access your results for 7 days.

Attributes for Data objects

$datum->results

An array reference of Results objects.

my @results = @{ $datum->results };
$datum->supplied_id

The id of the name string in the query (if provided).

$datum->supplied_name_string

The name string in the query.

Attributes for Results objects

$result->canonical_form

A "canonical" version of the name generated by the Global Names parser.

$result->classification_path

Tree path to the root if a name string was found within a data source classification.

$result->classification_path_ids

Tree path to the root using taxon_ids, if a name string was found within a data source classification.

$result->classification_path_ranks
$result->data_source_id

The id of the data source where a name was found.

$result->data_source_title

The title of the data source where a name was found.

$result->gni_uuid

An identifier for the found name string used in Global Names.

$result->local_id

Shows id local to the data source (if provided by the data source manager).

$result->match_type

Explains how resolver found the name. If the resolver cannot find names corresponding to the entire queried name string, it sequentially removes terminal portions of the name string until a match is found.

1 - Exact match

2 - Exact match by canonical form of a name

3 - Fuzzy match by canonical form

4 - Partial exact match by species part of canonical form

5 - Partial fuzzy match by species part of canonical form

6 - Exact match by genus part of a canonical form

$result->name_string

The name string found in this data source.

$result->prescore

Displays points used to calculate the score delimited by '|' -- "Match points|Author match points|Context points". Negative points decrease the final result.

$result->score

A confidence score calculated for the match. 0.5 means an uncertain result that will require investigation. Results higher than 0.9 correspond to 'good' matches. Results between 0.5 and 0.9 should be taken with caution. Results less than 0.5 are likely poor matches. The scoring is described in more details at http://resolver.globalnames.org/about.

$result->taxon_id

An identifier supplied in the source Darwin Core Archive for the name string record.

Attributes for DataSources objects

$data_source->id

The ID of the data source.

$data_source->title

The name of the data source.

Attributes for Parameters objects

$parameters->best_match_only
$parameters->data_sources

An array reference of data source ids you used for name resolution. If no data sources were given, the arrayref is empty.

my @data_sources = @{ $parameters->data_sources };
$parameters->header_only
$parameters->preferred_data_sources
$parameters->resolve_once

True if 'resolve_once' parameter is set to true and vice versa.

$parameters->with_context

True if 'with_context' parameter is set to true and vice versa.

Attributes for Context objects

$context->context_clade

A lowest taxonomic level in the data source that contains 90% or more of all names found. If there are too few names to determine, this element remains empty.

$context->context_data_source_id

The id of a data source used to create the context.

AUTHOR

Dimitrios - Georgios Kontopoulos, <d.kontopoulos13 at imperial.ac.uk>

BUGS

Please report any bugs or feature requests to bug-bio-taxonomy-globalnames at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-Taxonomy-GlobalNames.

I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

More details about Global Names Resolver's algorithm can be obtained from its website.

You can find documentation for this module with the perldoc command.

perldoc Bio::Taxonomy::GlobalNames

You can also look for information at:

LICENSE AND COPYRIGHT

Copyright 2013 Dimitrios - Georgios Kontopoulos.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.