NAME
nat-create - Command line tool to create NATools Corpora Objects
SYNOPSIS
nat-create <file1.nat> <file2.nat>
nat-create -tmx <file.tmx>
DESCRIPTION
This is the basic command used to create a NATools Corpora Object from the command line.
A NATools Corpora Object is a ditectory with:
the configuration file ("nat.cnf" - metadata information)
the corpus
the corpus indexes
the probabilistic translation dictionaries ("source-target.dmp", "target-source.dmp")
the (bi,tri,tetra)grams databases ("source.ngrams", "target.ngrams")
Known Switches
- tokenize
-
The
-tokenize
flag can be used to force NATools to tokenize the texts. Note that at the moment a Portuguese tokenizer is used for all languages. This might change in the future. - id
-
The
-id=name
flag can be used to force NATools Corpora name. By default the name is read interactively. - q
-
The
-q
flag can be used to force quiet mode. In thic case, the name is extracted from the file-names. - lang
-
The
-lang=PT..EN
flag can be used to force languages. - ngrams
-
The
-ngrams
flag can be set to force NATools to create ngrams indexes. - noEM
-
The
-noEM
flag is used to bypass the EM-Algorithm (useful for debug purposes, mainly). - ipfp
-
The
-ipfp
flag is mutually exclusive with-noEM
,-samplea
and-sampleb
. It defines that the EM-Algorithm to be used is the IPFP one. Optional numeric argument is the number of iterations. Defaults to 5. - samplea
-
The
-samplea
flag is mutually exclusive with-noEM
,-ipfp
and-sampleb
. It defines that the EM-Algorithm to be used is the Sample A one. Optional numeric argument is the number of iterations. Defaults to 10. - sampleb
-
The
-sampleb
flag is mutually exclusive with-noEM
,-ipfp
and-samplea
. It defines that the EM-Algorithm to be used is the Sample B one. Optional numeric argument is the number of iterations. Defaults to 10.
SEE ALSO
NATools documentation, perl(1)
AUTHOR
Alberto Manuel Brandão Simões, <ambs@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2006-2011 by Alberto Manuel Brandão Simões
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 254:
Non-ASCII character seen before =encoding in 'Brandão'. Assuming UTF-8