NAME
Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data
VERSION
version 0.08633_1
SYNOPSIS
In bash:
> treex myscenario.scen -- data/*.treex
> treex My::Block1 My::Block2 -- data/*.treex
In Perl:
use Treex::Core::Run q(treex);
treex([qw(myscenario.scen -- data/*.treex)]);
treex([qw(My::Block1 My::Block2 -- data/*.treex)]);
DESCRIPTION
Treex::Core::Run
allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex
. However, the same list of arguments can be passed by an array reference to the function treex()
imported from Treex::Core::Run
.
Note that this module supports distributed processing, simply by adding switch -p
. Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub
is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local
switch.
SUBROUTINES
- treex
-
create new runner and runs scenario given in parameters
USAGE
usage: treex [-?dEegjLmpqSsv] [long options...] scenario [-- treex_files]
scenario is a sequence of blocks or *.scen files
options:
-? --usage --help Prints this usage information.
-s --save save all documents
-q --quiet Warning, info and debug messages are
suppressed. Only fatal errors are
reported.
--cleanup Delete all temporary files.
-e --error_level Possible values: ALL, DEBUG, INFO, WARN,
FATAL
-E --forward_error_level messages with this level or higher will
be forwarded from the distributed jobs
to the main STDERR
-L --language --lang shortcut for adding "Util::SetGlobal
language=xy" at the beginning of the
scenario
-S --selector shortcut for adding "Util::SetGlobal
selector=xy" at the beginning of the
scenario
-g --glob Input file mask whose expansion is to
Perl, e.g. --glob '*.treex'
-p --parallel Parallelize the task on SGE cluster
(using qsub).
-j --jobs Number of jobs for parallelization,
default 10. Requires -p.
--jobindex Not to be used manually. If number of
jobs is set to J and modulo set to M,
only I-th files fulfilling I mod J == M
are processed.
--outdir Not to be used manually. Dictory for
collecting standard and error outputs in
parallelized processing.
--local Run jobs locally (might help with
multi-core machines). Requires -p.
--priority Priority for qsub, an integer in the
range -1023 to 0 (or 1024 for admins),
default=-100. Requires -p.
--memory -m --mem How much memory should be allocated for
cluster jobs, default=2G. Requires -p.
Translates to "qsub -hard -l
mem_free=$mem -l act_mem_free=$mem -l
h_vmem=$mem".
--qsub Additional parameters passed to qsub.
Requires -p. See --priority and --mem.
--watch re-run when the given file is changed
TODO better doc
--workdir working directory for temporary files in
parallelized processing (if not
specified, directories such as
001-cluster-run, 002-cluster-run etc.
are created)
-d --dump_scenario Just dump (print to STDOUT) the given
scenario and exit.
--dump_required_files Just dump (print to STDOUT) files
required by the given scenario and exit.
--survive Continue collecting jobs' outputs even
if some of them crashed (risky, use with
care!).
-v --version Print treex and perl version
AUTHOR
Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>
Martin Popel <popel@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.