NAME
nat-ptd - concentrates a set of PTD commands in a common interface
SYNOPSIS
nat-ptd [-v] <command> [command-args]
DESCRIPTION
nat-ptd
supports the following commands. Most places where a PTD needs to be specified, you can use a bziped2 PTD as far as the filename ends in bz2.
help
The method can be invoked without arguments, and a list of available commands will be printed.
If an optional parameter with the name of a command is supplied, it prints detailed help for it (from this man-page).
nat-ptd help [command-name]
intersect
Intersects domains from supplied PTDs. Keep lowerer counts and translation probabilities.
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
toSQLite
This option can be used to convert a PTD to the SQLite format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.
toDmp
This option can be used to convert a PTD to the Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.
toDmpBz
This option can be used to convert a PTD to a Bzipped Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.
stats
Prints some basic statistics about a PTD.
compare
Given two PTD, print some basic statistics comparing their size, domains, etc.
query
This command allows you to query interactively a PTD.
grep
Greps entries matching a specific pattern from a PTD. Supply a pattern and a PTD file. By default it dumps a subset PTD with entries that match. With the -compact
option it will print a small table with the entry's best translation.
nat-ptd grep [-compact] [-o=outfile] <pattern> <ptd-file>
compose
This method receives a two or more dictionaries.
When receiving a pair of dictionaries (first dictionary target language should be the same as the second dictionary source language), composes them, resulting a PTD from first dictionary source language to second dictionary target language.
This method can be used with more than two dictionaries for a full transitive dictionary computation.
You can specify the output filename with the -o
switch.
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
filter
This method filters a dictionary (or dictionary pair) accordingly with some default values (that can be adjusted).
If the supplied name is a directory, it is supposed to be of a NATools object (a NATools alignment folder). In this case, files source-target.dmp
and target-source.dmp
are searched inside it.
If the supplied name is not a directory, it is suppoed to be a name of a PTD dump file. This command will check if it is alone (just a direction) or if a second filename was supplied. If two were supplied, they are considered bidirectional (source-target and target-source).
Therefore, three possible usages:
nat-ptd filter <natools-obj-dir>
nat-ptd filter <file.dmp>
nat-ptd filter <file-s-t.dmp> <file-t-s.dmp>
The following switchs can be used:
-numbers
-
By default the filtering will remove terms (entries and translations) with numbers (only numbers, with possible digit separators: space, comma, point, colon). Use this switch to force them to be preserved.
-symbols
-
Any other term type that is not a standard word (with possible dash or apostrophe) or a number (as described above), is considered to include strange symbols, and will be ignored. Use this switch to force them to be preserved.
-none
-
By default, the 'no translation', also known as 'none', is removed. You can force it to be preserved with this switch.
-occs=n
-
Defines the minimum occurrence count for entries to be preserved. By default the used value is 2 (that is, entries with 1 occurrence are discarded). Use 0 to not discard any entry by occurrence count.
-prob=p
-
Defines the minimum probability for translations to be preserved. By default the value is 1% (0.01). Define the value as 0 to preserve all translations.
-bidir
-
Defines if the filtering should check for bidirectional translations, that is, preserve only terms which translations' translations' include that term. Mathematically, preserve t if
t in Translations ( Translations ( t ) )
Note that this is only available for NATool objects or dictionary pairs. By default this switch is ON. Turn it OFF assigning a 0 to the switch:
-bidir=0
Also, the -o
switch can be used to define an output filename. When using a pair of dictionaries, specify the output filenames separated by a comma: -o=outputfile1,outputfile2
.
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
lowercase
This method recompute the probabilities for a dictionary, lowercasing all terms, and summing up occurrences, and recomputing probabilities.
nat-ptd lowercase [-o=outputfile] <ptd-filename>
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
reprob
This method recompute the probabilities from a dictionary. It sums up all possible translations probabilities, consider that total to be 100% (1), and recomputes each probability accordingly.
It takes a required argument, the name of the PTD dump file. Optionally, you can supply an output file with the -o
switch.
nat-ptd reprob [-o=outputfile] <ptd-filename>
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
add
Adds two or more PTD files into a single PTD file. They should have the same source and target language. You can use the -o
switch to specify an output filename.
As of recent NATools versions, you can supply an option -type
to specify the type of output file (dmp
or sqlite
are supported, and dmp
is the default).
SEE ALSO
NATools, perl(1)
AUTHOR
Alberto Manuel Brandão Simões, <ambs@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2010-2011 by Alberto Manuel Brandão Simões