Name
Schedule::Depend
Synopsis
Single argument is assumed to be a schedule, either newline delimited text or array referent:
my $q = Scheduler->prepare( "newline delimited schedule" );
my $q = Scheduler->prepare( [ qw(array ref of schedule lines) ] );
Multiple items are assumed to be a hash, which much include the "sched" argument.
my $q = Scheduler->prepare( sched => "foo:bar", verbose => 1 );
Object can be saved and used to execute the schedule or the schedule can be executed (or debugged) directly:
$q->debug;
$q->execute;
Scheduler->prepare( sched => $depend)->debug;
Scheduler->prepare( sched => $depend, verbose => 1 )->execute;
Since the deubgger returns undef on a bogus queue:
Scheduler->prepare( sched => $depend)->debug->execute;
The "unalias" method can be safely overloaded for specialized command construction at runtime; precheck can be overloaded in cases where the status of a job can be determined easily (e.g., via /proc). A "cleanup" method may be provided, and will be called after the job is complete.
See notes under "unalias" and "runjob" for how jobs are dispatched. The default methods will handle shell code sub names automatically.
Arguments
- sched
-
The dependencies are described much like a Makefile, with targets waiting for other jobs to complete on the left and the dependencies on the right. Schedule lines can have single dependencies like:
waits_for : depends_on
or multiple dependencies:
wait1 wait2 : dep1 dep2 dep3
or no dependencies:
runs_immediately :
Which are unnecessary but can help document the code.
Dependencies without a wait_for argument are an error (e.g., ": foo" will croak during prepare).
The schedule can be passed as a single argument (string or referent) or with the "depend" key as a hash value:
sched => [ schedule as seprate lines in an array ] or sched => "newline delimited schedule, one item per line";
It is also possible to alias job strings:
foo = /usr/bin/find -type f -name 'core' | xargs rm -f ... foo : bar ...
will wait until bar has finished, unalias foo to the command string and pass the expanded version wholesale to the system command.
See the "Schedules" section for more details.
- verbose
-
Turns on verbose execution for preparation and execution.
All output controlled by verbosity is output to STDOUT; errors, roadkill, etc, are written to STDERR.
verbose == 0 only displays a few fixed preparation and execution messages. This is mainly intended for production system with large numbers of jobs where searching a large output would be troublesome.
verbose == 1 displays the input schedule contents during preparation and fork/reap messages as jobs are started.
verbose == 2 is intended for monitoring automatically generated queues and debugging new schedules. It displays the input lines as they are processed, forks/reaps, exit status and results of unalias calls before the jobs are exec-ed.
verbose can also be specified in the schedule, with schedule settings overriding the args. If no verbose setting is made then debug runs w/ verobse == 1, execution with 0.
- debug
-
Runs the full prepare but does not fork any jobs, pidfiles get a "Debugging $job" entry in them and an exit of 1. This can be used to test the schedule or debug side-effects of overloaded methods. See also: verbose, above.
- rundir & logdir
-
These are where the pidfiles and stdout/stderr of forked jobs are placed, along with stdout (i.e., verbose) messages from the que object itself.
These can be supplied via the schedule using aliases "rundir" and "logdir". Lacking any input from the schedule or arguments all output goes into the #! file's directory (i.e., dirname $0).
Note: The last option is handy for running code via soft link w/o having to provide the arguments each time. The RBTMU.pm module in examples can be used in a single #! file, soft linked in to any number of directories with various .tmu files and then run to load the varoius groups of files.
Description
Parallel scheduler with simplified make syntax for job dependencies and substitutions. Like make, targets have dependencies that must be completed before the can be run. Unlike make there are no statements for the targets, the targets are themselves executables.
The use of pidfiles with status information allows running the queue in "restart" mode. This skips any jobs with zero exit status in their pidfiles, stops and re-runs or waits for any running jobs and launches anything that wasn't started. This should allow a schedule to be re-run with a minimum of overhead.
The pidfile serves three purposes:
- Restarts
-
On restart any leftover pidfiles with a zero exit status in them can be skipped.
- Waiting
-
Any process used to monitor the result of a job can simply perform a blocking I/O to for the exit status to know when the job has completed. This avoids the monitoring system having to poll the status.
- Tracking
-
Tracking the empty pidfiles gives a list of the pending jobs. This is mainly useful with large queues where running in verbose mode would generate execesive output.
Each job is executed via fork/exec (or sub call, see notes for unalias and runjob). The parent writes out a pidfile with initially two lines: pid and command line. It then closes the pidfile. The child keeps the file open and writes its exit status to the file if the job completes; the parent writes the returned status to the file also. This makes it rather hard to "loose" the completion and force an abort on restart.
Schedules
The configuration syntax is make-like. The two sections give aliases and the schedule itself. Aliases and targets look like make rules:
target = expands_to
target : dependency
example:
a = /somedir/abjob.ksh
b = /somedir/another.ksh
c = /somedir/loader
a : /somedir/startup.ksh
b : /somedir/startup.ksh
c : a b
/somedir/validate : a b c
Will use the various path expansions for "a", "b" and "c" in the targets and rules, running /somedir/abjob.ksh only after /somedir/startup.ksh has exited zero, the same for /somedir/another.ksh. The file /somedir/loader gets run only after both abjob.ksh and another.ksh are done with and the validate program gets run only after all of the other three are done with.
The main uses of aliases would be to simplify re-use of scripts. One example is the case where the same code gets run multiple times with different arguments:
# comments are introduced by '#', as usual.
# blank lines are also ignored.
a = /somedir/process 1
b = /somedir/process 2
c = /somedir/process 3
d = /somedir/process 4
e = /somedir/process 5
f = /somedir/process 6
a : /otherdir/startup # startup.ksh isn't aliased
b : /otherdir/startup
c : /otherdir/startup
d : a b
e : b c
f : d e
cleanup : a b c d e f
Would allow any variety of arguments to be run for the a-f code simply by changing the aliases, the dependencies remain the same.
Another example is a case of loading fact tables after the dimensions complete:
fact1 fact2 fact3 : dim1 dim2 dim3
Would load all of the dimensions at once and the facts afterward. Note that stub entries are not required for the dimensions, they are added as runnable jobs when the rule is read.
If the jobs unalias to the names of the que object's methods then the code will be called instead of sending the string through system. For example:
job = /path/to/runscript
foo = cleanup
bar = cleanup
xyz = cleanup
job : ./startup
foo bar xyz : job
Will run ./startup via system in the local directory, run the job via system also then call $que->cleanup('foo'), $que->cleanup('bar'), and $que->cleanup('xyz') in parallel then finish (assuming they all exist, of course).
This allows the schedule to easily mix subroutine and shell code as necessary or convienent.
The final useful alais is an empty one, or the string "PHONY". This is used for placeholers, mainly for breaking up long lines or assembling schedules automatically:
waitfor =
waitfor : job1
waitfor : job2
waitfor : job3
waitfor : job4
job5 job6 job7 : waitfor
will generate a stub that immediately returns zero for the "waitfor" job. This allows the remaining jobs to be hard coded -- or the job1-4 strings to be long file paths -- without having to generate huge lines or dynamicaly build the job5-7 line.
Overloading unalias for special job expansion.
Up to this point all of the schedule processing has been handled automatically. There may be cases where specialized processing of the jobs may be simpler. One example is where the "jobs" are known to be data files being loaded into a database, another is there the subroutine calls must come from an object other than the que itself.
In this case the unalias or runjob methods can be overloaded. Because runjob will automatically handle calling subroutines within perl vs. passing strings to the shell, most of the overloading can be done in unalias.
If unalias returns a code referent then it will be used to execute the code. One way to handle file processing for, say, rb_tmu loading dimension files before facts would be a schedule like:
dim1 = tmu_loader
dim2 = tmu_loader
dim3 = tmu_loader
fact1 = tmu_loader
fact2 = tmu_loader
fact2 fact1 : dim1 dim2 dim3
This would call $que->tmu_loader( 'dim1' ), etc, allowing the jobs to be paths to files that need to be loaded.
The problem with this approach is that the file names can change for each run, requiring more complicated code.
In this case it may be easier to overload the unalias method to process file names for itself. This might lead to the schedule:
fact2 fact1 : dim1 dim2 dim3
and nothing more with unalias deciding what to do with the files at runtime:
sub unalias
{
my $que = shift;
my $datapath = shift;
my $tmudir = dirname $0;
my $filename = basename $datapath;
my $tmufile = dirname($0) . '/' . basename($datapath) . '.tmu';
-e $tmufile or croak "$$: Missing: $tmufile";
# unzip zipped files, otherwise just redrect them
if( $datapath =~ /.gz$/ )
{
"gzip -dc $datapath | rb_ptmu $tmufile \$RB_USER"
}
else
{
"rb_tmu $tmufile \$RB_USER < $datapath"
}
# caller gets back one of the two command
# strings
}
In this case all the schedule needs to contain are paths to the data files being loaded. The unalias method deals with all of the rest at runtime.
Adding a method to the derived class for more complicated processing of the files (say moving the completed files to an archive area and zipping them if necessary) could be handled by passing a closure:
sub unalias
{
my $que = shift;
my $datapath = shift;
-e $datapath or croak "$$: Nonexistint data file: $datapath";
# process the files, all further logic
# is dealt with in the loader sub.
sub { tmuload_method $datapath }
}
Since code references are processed within perl this will not be passed to the shell. It will be run in the forked process, with the return value of tmuload_method being passed back to the parent process.
Using an if-ladder various subroutines can be chosen from when the job is unaliased (in the parent) or in the subroutine called (in the child).
Aliases can pass shell variables.
Since the executed code is fork-execed it can contain any useful environment variables also:
a = process --seq 1 --foo=$BAR
will interpolate $BAR at fork-time in the child process (i.e.. by the shell handling the exec portion).
The scheduling module exports modules for managing the preparation, validation and execution of schedule objects. Since these are separated they can be manipulated by the caller as necessary.
One example would be to read in a set of schedules, run the first one to completion, modify the second one based on the output of the first. This might happen when jobs are used to load data that is not always present. The first schedule would run the data extract/import/tally graphs. Code could then check if the tally shows any work for the intermittant data and stub out the processing of it by aliasing the job to "/bin/true":
/somedir/somejob.ksh = /bin/true
prepare = /somedir/extract.ksh
load = /somedir/batchload.ksh
/somedir/somejob.ksh : prepare
/somedir/ajob.ksh : prepare
/somedir/bjob.ksh : prepare
load : /somedir/somejob.ksh /somedir/ajob.ksh /somedir/bjob.ksh
In this case /somedir/somejob.ksh will be stubbed to exit zero immediately. This will not interfere with any of the scheduling patterns, just reduce any dealays in the schedule.
Note on calling convention for closures from unalias.
The closures generated in unalias vary on their parameter passing:
$run = sub { $sub->( $job ) }; $package->can( $subname )
$run = sub { $que->$sub( $job ) }; $que->can( $run )
$run = sub { __PACKAGE__->$sub( $job ) }; __PACKAGE__->can( $run )
$run = eval "sub $block"; allows perl block code.
The first case comes up because Foo::bar in a schedule is unlikey to successfully process any package arguments. The __PACKAGE__ situation is only going to show up in cases where execute has been overloaded, and the subroutines may need to know which package context they were unaliased.
The first case can be configured to pass the package in by changing it to:
$run = sub { $packge->$sub( $job ) };
This will pass the package as $_[0].
The first test is necessary because:
$object->can( 'Foo::bar' )
alwyas returns \&Foo::bar, which called as $que->$sub puts a stringified version of the object into $_[0], and getting something like "2/8" is unlikely to be useful as an argument.
The last is mainly designed to handle subroutines that have multiple arguments which need to be computed at runtime:
foo = { do_this( $dir, $blah); do_that }
or when scheduling legacy code that might not exit zero on its own:
foo = { some_old_sub(@argz); 0 }
The exit from the block will be used for the non-zero exit status test in the parent when the job is run.
Notes on methods
Summary by subroutine call, with notes on overloading and general use.
boolean overload
Simplifies the test for remaining jobs in execute's while loop; also helps hide the guts of the queue object from execute since the test reduces to while( $que ).
ready
Return a list of what is runnable in the queue. these will be any queued jobs which have no keys in their queued subhash. e.g., the schedule entry
"foo : bar"
leaves
$queued->{foo}{bar} = 1.
foo will not be ready to excute until keys %{$queued->{foo}} is false (i.e., $queued->{foo}{bar} is deleted in the completed module).
This is used in two places: as a sanity check of the schedule after the input is complete and in the main scheduling loop.
If this is not true when we are done reading the configuration then the schedule is bogus.
Overloading this might allow some extra control over priority where maxjobs is set by modifying the sort to include a priority (e.g., number of waiting jobs).
queued, depend
queued hands back the keys of the que's "queued" hash. This is the list of jobs which are waiting to run. The keys are sorted lexically togive a consistent return value.
depend hands back the keys of que's "depend" hash for a particular job. This is a list of the jobs that depend on the job.
Only reason to overload these would be in a multi-stage system where one queue depends on another. It may be useful to prune the second queue if something abnormal happens in the first (sort of like make -k continuing to compile).
Trick would be for the caller to use something like:
$q1->dequeue( $_ ) for $q0->depend( $job_that_failed );
croak "Nothing left to run" unless $q1;
note that the sort allows for priority among tags when the number of jobs is limited via maxjob. Jobs can be given tags like "00_", "01_" or "aa_", with hotter jobs getting lexically lower tag values.
dequeue
Once a job has been started it needs to be removed from the queue immediately. This is necessary because the queue may be checked any number of times while the job is still running.
For the golf-inclined this reduces to
delete $_[0]->{queued}{$_[1]}
for now this looks prettier.
Compare this to the complete method which is run after the job completes and deals with pidfile and cleanup issues.
complete
Deal with job completion. Internal tasks are to update the dependencies, external cleanups (e.g., zipping files) can be handled by adding a "cleanup" method to the queue.
Thing here is to find all the jobs that depend on whatever just got done and remove their dependency on this job.
$depend->{$job} was built in the constructor via:
push @{ $depend->{$_} }, $job for @dependz;
Which assembles an array of what depeneds on this job. Here we just delete from the queued entries anything that depends on this job. After this is done the runnable jobs will have no dependencies (i.e., keys %{$q{queued}{$job} will be an empty list).
A "cleanup" can be added for post-processing (e.g., gzip-ing processed data files or unlinking scratch files). It will be called with the que and job string being cleaned up after.
unalias, runjob
Expand an alias used in a rule, execute the unaliased job. Default case is to look the tag up in $que->{alias} and return either an alias or the original tag and exec the expanded string via the current shell.
One useful alternative is to use dynamic expansion of the tag being unaliased (e.g., the TMU example in the main notes, above). Another is to expand the tag into a code reference via:
sub unalias
{
my ($que,$job) = (shift,shift);
no strict 'refs';
my $sub = \&$job;
}
or
my $sub = sub { handler $job };
to use a closure instead of various subroutine references.
This allows queueing subroutines rather than shell code.
runjob accepts a scalar to be executed, either via exec in the shell or a subroutine call. The default is to exit with the return status of a subroutine call or exec the shell code or die.
precheck
Isolate the steps of managing the pidfiles and checking for a running job.
This varies enough between operating systems that it'll make for less hacking if this is in one place or can be overridden.
This returns true if the pidfile contains the pid for a running job. depending on the operating system this can also check if the pid is a copy of this job running.
If the pid's have simply wrapped then someone will have to clean this up by hand. Problem is that on Solaris (at least through 2.7) there isn't any good way to check the command line in /proc.
On HP it's worse, since there isn't any /proc/pid. there we need to use a process module or parse ps.
On solaris the /proc directory helps:
croak "$$: job $job is already running: /proc/$dir"
if( -e "/proc/$pid" );}
but all we can really check is that the pid is running, not that it is our job.
On linux we can also check the command line to be sure the pid hasn't wrapped and been re-used (not all that far fetched on a system with 30K blast searches a day for example).
Catch: If we zero the pidfile here then $q->debug->execute fails because the file is open for append during the execution and we get two sets of pid entries. The empty pidfiles are useful however, and are a good check for writability.
Fix: deal with it via if block in execute.
prepare
Read the schedule and generate a queue from it.
Lines arrive as:
job = alias expansion of job
or
job : depend on other jobs
any '#' and all text after it on a line are stripped, regardless of quotes or backslashes and blank lines are ignored.
Basic sanity checks are that none of the jobs is currently running, no job depends on istelf to start and there is at least one job which is inidially runnable (i.e., has no dependencies).
Caller gets back a blessed object w/ sufficient info to actually run the scheduled jobs.
The only reason for overloading this would be to add some boilerplate to the parser. The one here is sufficient for the default grammar, with only aliases and dependencies of single-word tags.
Note: the "ref $item || $item" trick allows this to be used as a method in some derived class. in that case the caller will get back an object blessed into the same class as the calling object. This simplifies daisy-chaining the construction and saves the deriving class from having to duplicate all of this code in most cases.
debug
Stub out the execution, used to check if the queue will complete. Basic trick is to make a copy of the object and then run the que with "norun" set.
This uses Dumper to get a deep copy of the object so that the original queue isn't consumed by the debug process, which saves having to prepare the schedule twice to debug then execute it.
two simplest uses are:
if( my $que = S::D->prepare( @blah )->debug ) {...}
or
eval { S::D->prepare( @blah )->debug->execute }
depending on your taste in error handling.
execute
Actually do the deed. There is no reason to overload this that I can think of.
Known Bugs
The block-eval of code can yield all sorts of oddities if the block has side effects (e.g., exit()). This probably needs to be better wrapped. In any case, caveat scriptor...
The eval also needs to be better tested in test.pl.
Author
Steven Lembark, Knightsbridge Solutions slembark@knightsbridge.com
Copyright
(C) 2001-2002 Steven Lembark, Knightsbridge Solutions
This code is released under the same terms as Perl istelf. Please see the Perl-5.6.1 distribution (or later) for a full description.
In any case, this code is release as-is, with no implied warranty of fitness for a particular purpose or warranty of merchantability.
See Also
perl(1)
perlobj(1) perlfork(1) perlreftut(1)
Other scheduling modules:
Schedule::Parallel(1) Schedule::Cron(1)
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1435:
You forgot a '=back' before '=head1'