NAME
Bio::Taxonomy::GlobalNames - Perlish OO bindings to the Global Names Resolver API
VERSION
Version 0.03
SYNOPSIS
use Bio::Taxonomy::GlobalNames;
# Provide the input data and parameters.
my $query = Bio::Taxonomy::GlobalNames->new(
names => $names,
data_source_ids => $data_source_ids,
resolve_once => $resolve_once,
);
my $output = $query->post(); # Perform a POST request and return the output.
# Go through the Output object.
my @data = @{ $output->data };
foreach my $datum (@data)
{
# Check if a non-empty Results arrayref was returned.
if ( my @results = @{ $datum->results } )
{
# Parse the Results objects.
foreach my $result (@results)
{
# Retrieve the canonical name and score for each result.
my $canonical_name = $result->canonical_form;
my $score = $result->score;
}
}
}
DESCRIPTION
Bio::Taxonomy::GlobalNames provides Perl objects and functions that interface with the Global Names Resolver web service. Using a REST client, input is sent to the service, whereas results are internally converted from JSON format to nested objects and returned to the user.
This module can be used for automated standardisation of species names, according to a variety of sources that can be manually selected, if needed. See also the example script, provided with this module.
Attributes for Bio::Taxonomy::GlobalNames objects
- data
-
A string with a list of names delimited by new lines. You may optionally supply your local id for each name as:
123|Parus major 125|Parus thruppi 126|Parus carpi
Names in the response will contain your supplied ids, facilitating integration.
The attributes 'data', 'file' and 'names' are mutually exclusive.
- data_source_ids
-
A string with a pipe-delimited list of data sources. See the list of data sources.
- file
-
A file in Unicode encoding with a list of names delimited by new lines, similar to the 'data' attribute. This attribute is valid only when the post method is used.
The attributes 'data', 'file' and 'names' are mutually exclusive.
- names
-
A string with a list of names delimited by either pipe "|" or tab "\t". Use a pipe with the get method.
The attributes 'data', 'file' and 'names' are mutually exclusive.
- resolve_once
-
A string with a boolean (true/false) value. Default: 'false'. Find the first available match instead of matches across all data sources with all possible renderings of a name.
When 'true', response is rapid but incomplete.
- with_context
-
A string with a boolean (true/false) value. Default: 'true'. Reduce the likelihood of matches to taxonomic homonyms.
When 'true', a common taxonomic context is calculated for all supplied names from matches in data sources that have classification tree paths. Names out of determined context are penalized during score calculation.
Methods for Bio::Taxonomy::GlobalNames objects
- get
-
Performs a GET request and returns an
Output
object.
- post
-
Performs a POST request and returns an
Output
object. If you are supplying an input file, you have to use the 'post' method.
Attributes for Output objects
- $output->context
-
A
Context
object, if 'with_context' parameter is set to true. - $output->data
-
An array reference of
Data
objects, containing query input(s) and results.my @data = @{ $output->data };
- $output->data_sources
-
An array reference of
DataSources
objects, whose ids you used for name resolution. If no data sources were given, the array reference is empty.my @data_sources = @{ $output->data_sources };
- $output->id
-
The resolver request id. Your request is stored temporarily in the remote database and is assigned an id.
- $output->parameters
-
A
Parameters
object, containing the parameters of the query. - $output->status or $output->message
-
The final status of the request -- 'success' or 'failure'.
- $output->status_message
-
The message associated with the status.
- $output->url
-
The url at which you can access your results for 7 days.
Attributes for Data objects
- $datum->results
-
An array reference of
Results
objects.my @results = @{ $datum->results };
- $datum->supplied_id
-
The id of the name string in the query (if provided).
- $datum->supplied_name_string
-
The name string in the query.
Attributes for Results objects
- $result->canonical_form
-
A "canonical" version of the name generated by the Global Names parser.
- $result->classification_path
-
Tree path to the root if a name string was found within a data source classification.
- $result->classification_path_ids
-
Tree path to the root using taxon_ids, if a name string was found within a data source classification.
- $result->classification_path_ranks
- $result->data_source_id
-
The id of the data source where a name was found.
- $result->data_source_title
-
The title of the data source where a name was found.
- $result->gni_uuid
-
An identifier for the found name string used in Global Names.
- $result->local_id
-
Shows id local to the data source (if provided by the data source manager).
- $result->match_type
-
Explains how resolver found the name. If the resolver cannot find names corresponding to the entire queried name string, it sequentially removes terminal portions of the name string until a match is found.
1 - Exact match
2 - Exact match by canonical form of a name
3 - Fuzzy match by canonical form
4 - Partial exact match by species part of canonical form
5 - Partial fuzzy match by species part of canonical form
6 - Exact match by genus part of a canonical form
- $result->name_string
-
The name string found in this data source.
- $result->prescore
-
Displays points used to calculate the score delimited by '|' -- "Match points|Author match points|Context points". Negative points decrease the final result.
- $result->score
-
A confidence score calculated for the match. 0.5 means an uncertain result that will require investigation. Results higher than 0.9 correspond to 'good' matches. Results between 0.5 and 0.9 should be taken with caution. Results less than 0.5 are likely poor matches. The scoring is described in more details at http://resolver.globalnames.org/about.
- $result->taxon_id
-
An identifier supplied in the source Darwin Core Archive for the name string record.
Attributes for DataSources objects
Attributes for Parameters objects
- $parameters->best_match_only
- $parameters->data_sources
-
An array reference of data source ids you used for name resolution. If no data sources were given, the arrayref is empty.
my @data_sources = @{ $parameters->data_sources };
- $parameters->header_only
- $parameters->preferred_data_sources
- $parameters->resolve_once
-
True if 'resolve_once' parameter is set to true and vice versa.
- $parameters->with_context
-
True if 'with_context' parameter is set to true and vice versa.
Attributes for Context objects
- $context->context_clade
-
A lowest taxonomic level in the data source that contains 90% or more of all names found. If there are too few names to determine, this element remains empty.
- $context->context_data_source_id
-
The id of a data source used to create the context.
AUTHOR
Dimitrios - Georgios Kontopoulos, <d.kontopoulos13 at imperial.ac.uk>
BUGS
Please report any bugs or feature requests to bug-bio-taxonomy-globalnames at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Bio-Taxonomy-GlobalNames.
I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
More details about Global Names Resolver's algorithm can be obtained from its website.
You can find documentation for this module with the perldoc command.
perldoc Bio::Taxonomy::GlobalNames
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Bio-Taxonomy-GlobalNames
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search MetaCPAN
GitHub
LICENSE AND COPYRIGHT
Copyright 2013 Dimitrios - Georgios Kontopoulos.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.