NAME
Lingua::NATools::Client - Simple API to query NAT Objects
SYNOPSIS
use Lingua::NATools::Client;
$client = Lingua::NATools::Client->new();
DESCRIPTION
Lingua::NATools::Client is a simple query API to talk with NAT copora Objects. It can use a client-server approach (See nat-server) or directly with local access to the filesystem.
Methods
This module includes functions to query NATools Objects. To query you must first create a client object with the new method.
new
The new object receives an hash with configuration parameters, and creates a client object. For instance,
$client = Lingua::NATools::Client->new( Local => "/opt/corpora/foo" );
Known options are:
- PeerAddr
-
The IP address where the server is running on. Defaults to 127.0.0.1.
- PeerPort
-
The port to be used in the connection. Defaults to 4000.
- Local
-
A local directory with a NATools object. Note than not all methods support local corpora.
- LocalDumper
-
A local Data::Dumper object with a NATools PTD. Note than not all methods support local NATools PTDs.
If the LocalDumper value is a reference to an array it is supposed to contain two positions, with both dictionary filenames. If its value is a string, it is supposed to be the filename with BOTH dictionaries included.
iterate
This method is used to iterate through a probabilistic translation dictionary. Pass a function reference to handle each dictionary entry. This function will be called with a flattened hash with keywords word
, trans
and count
.
Use as first argument an hash reference to configure the method behaviour. For instance:
$client -> iterate( {Language => 'source'},
sub {
my %param = @_;
print "$param{word}\n";
});
meta_information
list
This method is only available on server mode. Returns an hash table where keys are corpora names (identifiers). Values are hash tables with keys "id", """source" and "target". Values are the corpus identifier and the language names.
$corpora = $client->list;
# $corpora={ Crp1=> { id=> 1, source=> 'PT', target=> 'EN' } }
set_corpus
This method is also used only on server mode. It selects a corpus that will be used by all subsequent queries.
$client->set_corpus(3);
ptd
This method is used to query Probabilistic Translation Dictionaries. As first argument you might pass a hash reference with configuration options. The only mandatory one is the word being searched.
Known options are:
- crp
-
A corpus identifier to use. If not set, will use the first one or the one selected previously with
set_corpus
- direction
-
This option chooses the direction on the query. By default, a query on the source language is used. If direction is
<~
the target language is used.On local corpus mode, and server mode, you can query by identifier instead of word. For that use as direction
~#>
or<#~
.
Returns an array reference. First element if the occurrence count of the word, second is an hash with the translation probabilities, and the third one is the word searched.
attribute
To query meta-information use this method. At the moment it just works for server corpora. Pass it a reference to a configuration hash if you need to choose the corpus (see the ptd
documentation, for instance). Mandatory parameter is the name of the attribute being queried. Returns the value if found, undef otherwise.
conc
This method is used to query for concordancies on the corpus. This method is not available with LocalDumper
.
Mandatory arguments are one or two strings to search. First argument might be an hash reference with configuratoin details:
- crp
-
The corpus identifier to be queried. Just used on server mode. If not used, the identifier 1 is used, or the one selected before with the
set_corpus
method. - direction
-
The direction on which the query will be done. At the moment, it defaults to query on the source side (thus, ignoring the second argument). You might use
<-
to query the target language (also ignores the second argument) or to use<->
to query both languages.If you want to do pattern matching, use one of
=>
,<=
or<=>
.TODO: make this interface cleaner.
- count
-
Number of results to be presented. Defaults to 20. This value is always limited by the server.
ngrams
This method is used to query the ngram databases. Not all corpus have the ngram indexes, thus, some answers might be just a reference to an empty list.
At the moment use the same parameters for configuration as other methods (diretion
and crp
), and a string with the query. For instance:
foo * --> all bigram with "foo" as first word
foo * bar --> all trigrams with foo as first word
and bar as the last word
foo bar --> the bigram "foo bar"
It returns a list of ngrams. Each ngram is a list the the words, and as the last element the occurrence count.
SEE ALSO
See perl(1) and NATools documentation.
AUTHOR
Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>
COPYRIGHT AND LICENSE
Copyright 2002-2012 by Natura Project http://natura.di.uminho.pt
This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.