NAME

dtatw-txml2uxml.perl - DTA::TokWrap: convert .t.xml to enrichted .u.xml

SYNOPSIS

dtaec-txml2uxml.perl [OPTIONS] [TXMLFILE]

General Options:
 -help                  # this help message

Auxilliary Files Options:
 -textfile TXTFILE      # .txt file for TXMLFILE://w/@b locations
 -cpxfile  WPXFILE      # .cpx file for output //w/@pb locations
 -wpxfile  WPXFILE      # .wpx file for output //w/@pb locations (overrides -cpxfile)
 -cxfile   CXFILE       # .cx file for output char-spans //w/@cs (default: none (use heuristics))

Attribute Insertion Options:
 -pb     , -nopb        # do/don't parse and output page break indices as //w/@pb (default=only if -wpxfile or -cpxfile is given)
 -t0     , -not0        # do/don't output original text from TXTFILE as //w/@t0 (default=do)
 -cruft  , -nocruft     # do/don't output unicruft approximations as //w/@u rsp //w/@u0 (default=do)
 -chars  , -nochars     # do/don't output inter-token chars as //c (default=don't)
 -spans  , -nospans     # do/don't compute //w/@cs from //w/@c (default=do)
 -keep-c , -nokeep-c    # do/don't keep //w/@c if computing //w/@cs (default=don't)
 -guess  , -noguess     # do/don't use heuristics for computing //w/@cs (default only if CXFILE not given)

Attribute Trimming Options:
 -trim-t , -notrim-t    # do/don't trim redundant //w/(@t0,@u,@u0) attributes (default=don't)
 -trim-x , -notrim-x    # do/don't compress //w/@xp using sentence-wide prefixes (default=do)
 -trim   , -notrim      # set both -trim-t and -trim-x at the same time

I/O Options:
 -ent    , -noent       # don't/do expand entities (default=don't (-ent))
 -blanks , -noblanks    # do/don't keep "ignorable" input blanks (default=don't (-noblanks))
 -ws     , -nows        # do/don't keep token-internal whitespace (default=don't (-nows))
 -format , -noformat    # do/don't pretty-print output? (default=do (-format))
 -output OUTFILE        # specify output file (default='-' (STDOUT))

OPTIONS AND ARGUMENTS

Not yet written.

DESCRIPTION

Not yet written.

SEE ALSO

...

AUTHOR

Bryan Jurish <moocow@cpan.org>