NAME
cwb-align-import - Import existing sentence alignment into a CWB corpus
SYNOPSIS
cwb-align-import [options] <alignment_beads.txt>
Options:
-r <dir>, --registry=<dir> use registry directory <dir>
-i, --inverse encode inverse alignment (target -> source)
-p, --prune ignore alignment beads with ID errors
-e, --empty allow 1:0 and 0:1 alignments (not encoded)
-v, --verbose show progress messages during processing
-h, --help display short help page
-nh, --no-header alignment file without header; must specify:
-l1 <name>, --source=<name> CWB name of source corpus
-l2 <name>, --target=<name> CWB name of target corpus
-s <att>, --grid=<att> alignment grid (s-attribute, usually sentences)
-k <spec>, --key=<spec> pattern for constructing unique sentence IDs
DESCRIPTION
Short description of what the module does
OPTIONS
- --help, -h
-
Show usage and options summary.
- --verbose, -v
-
Verbose mode (shows progress messages during processing).
==item --registry=dir, -r dir
Locate corpora in CWB registry directory dir, overriding the default directory and the environment variable
CORPUS_REGISTRY
. - --inverse, -i
-
Encode inverse alignment (from target language to source language).
- --prune, -p
-
Automatically ignore alignment beads if sentence IDs are not found, either in the source or the target corpus. Without
-p
, cwb-align-import will abort with an error message in this case. Note that the-p
option implies-e
(see below). - --empty, -e
-
Allow 1:0 and 0:1 alignment beads, which will be silently ignored (without
-e
, they cause a fatal error). - --no-header, -nh
-
Alignment file does not contain a header line. In this case, the header information must be provided on the command line with the
-l1
,-l2
,-s
and-k
flags (documented below). - --source=ID, -l1 ID
-
CWB corpus ID of the source language corpus. Overrides information in alignment file header, if present.
- --target=ID, -l2 ID
-
CWB corpus ID of the target language corpus. Overrides information in alignment file header, if present.
- --grid=attribute, -s attribute
-
CWB attribute used as alignment grid (i.e., each alignment bead links n grid regions in the source language to m grid regions in the target language). For the most common case of sentence alignment, the grid attribute will usually be
s
. Note that the same attribute is used for both source and target language corpus. - --key=pattern, -k pattern
DETAILS
AUTHOR
Stefan Evert <stefan.evert@uos.de>
COPYRIGHT
Copyright (C) 2007-2010 Stefan Evert [http::/purl.org/stefan.evert]
This software is provided AS IS and the author makes no warranty as to its use and performance. You may use the software, redistribute and modify it under the same terms as Perl itself.