NAME

Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data

VERSION

version 0.08399

SYNOPSIS

In bash:

> treex myscenario.scen -- data/*.treex
> treex My::Block1 My::Block2 -- data/*.treex

In Perl:

use Treex::Core::Run q(treex);
treex([qw(myscenario.scen -- data/*.treex)]);
treex([qw(My::Block1 My::Block2 -- data/*.treex)]);

DESCRIPTION

Treex::Core::Run allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex. However, the same list of arguments can be passed by an array reference to the function treex() imported from Treex::Core::Run.

Note that this module supports distributed processing, simply by adding switch -p. Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local switch.

SUBROUTINES

treex

create new runner and runs scenario given in parameters

USAGE

usage: treex [-?dEegjLmpqSsv] [long options...] scenario [-- treex_files]
scenario is a sequence of blocks or *.scen files
options:
	-? --usage --help            Prints this usage information.
	-s --save                    save all documents
	-q --quiet                   Warning, info and debug messages are
	                             suppressed. Only fatal errors are
	                             reported.
	--cleanup                    Delete all temporary files.
	-e --error_level             Possible values: ALL, DEBUG, INFO, WARN,
	                             FATAL
	-E --forward_error_level     messages with this level or higher will
	                             be forwarded from the distributed jobs
	                             to the main STDERR
	-L --language --lang         shortcut for adding "Util::SetGlobal
	                             language=xy" at the beginning of the
	                             scenario
	-S --selector                shortcut for adding "Util::SetGlobal
	                             selector=xy" at the beginning of the
	                             scenario
	-g --glob                    Input file mask whose expansion is to
	                             Perl, e.g. --glob '*.treex'
	-p --parallel                Parallelize the task on SGE cluster
	                             (using qsub).
	-j --jobs                    Number of jobs for parallelization,
	                             default 10. Requires -p.
	--jobindex                   Not to be used manually. If number of
	                             jobs is set to J and modulo set to M,
	                             only I-th files fulfilling I mod J == M
	                             are processed.
	--outdir                     Not to be used manually. Dictory for
	                             collecting standard and error outputs in
	                             parallelized processing.
	--local                      Run jobs locally (might help with
	                             multi-core machines). Requires -p.
	--priority                   Priority for qsub, an integer in the
	                             range -1023 to 0 (or 1024 for admins),
	                             default=-100. Requires -p.
	--memory -m --mem            How much memory should be allocated for
	                             cluster jobs, default=2G. Requires -p.
	                             Translates to "qsub -hard -l
	                             mem_free=$mem -l act_mem_free=$mem -l
	                             h_vmem=$mem".
	--qsub                       Additional parameters passed to qsub.
	                             Requires -p. See --priority and --mem.
	--watch                      re-run when the given file is changed
	                             TODO better doc
	--workdir                    working directory for temporary files in
	                             parallelized processing (if not
	                             specified, directories such as
	                             001-cluster-run, 002-cluster-run etc.
	                             are created)
	-d --dump_scenario           Just dump (print to STDOUT) the given
	                             scenario and exit.
	--survive                    Continue collecting jobs' outputs even
	                             if some of them crashed (risky, use with
	                             care!).
	-v --version                 Print treex and perl version

AUTHOR

Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.