NAME
Forks::Super - extensions and convenience methods for managing background processes.
VERSION
Version 0.32
SYNOPSIS
use Forks::Super;
use Forks::Super MAX_PROC => 5, DEBUG => 1;
# familiar use - parent returns PID>0, child returns zero
$pid = fork();
die "fork failed" unless defined $pid;
if ($pid > 0) {
# parent code
} else {
# child code
}
# wait for a child process to finish
$w = wait; # blocking wait on any child, $? holds child exit status
$w = waitpid $pid, 0; # blocking wait on specific child
$w = waitpid $pid, WNOHANG; # non-blocking wait, use with POSIX ':sys_wait_h'
$w = waitpid 0, $flag; # wait on any process in current process group
waitall; # block until all children are finished
# -------------- helpful extensions ---------------------
# fork directly to a shell command. Child doesn't return.
$pid = fork { cmd => "./myScript 17 24 $n" };
$pid = fork { exec => [ "/bin/prog" , $file, "-x", 13 ] };
# fork directly to a Perl subroutine. Child doesn't return.
$pid = fork { sub => $methodNameOrRef , args => [ @methodArguments ] };
$pid = fork { sub => \&subroutine, args => [ @args ] };
$pid = fork { sub => sub { "anonymous sub" }, args => [ @args ] );
# put a time limit on the child process
$pid = fork { cmd => $command, timeout => 30 }; # kill child if not done in 30s
$pid = fork { sub => $subRef , expiration => 1260000000 }; # complete by 8AM Dec 5, 2009 UTC
# obtain standard filehandles for the child process
$pid = fork { child_fh => "in,out,err" };
if ($pid == 0) { # child process
sleep 1;
$x = <STDIN>; # read from parent's $Forks::Super::CHILD_STDIN{$pid}
print rand() > 0.5 ? "Yes\n" : "No\n" if $x eq "Clean your room\n";
sleep 2;
$i_can_haz_ice_cream = <STDIN>;
if ($i_can_haz_ice_cream !~ /you can have ice cream/ && rand() < 0.5) {
print STDERR '@#$&#$*&#$*&',"\n";
}
exit 0;
} # else parent process
$child_stdin = $Forks::Super::CHILD_STDIN{$pid};
print $child_stdin "Clean your room\n";
sleep 2;
$child_stdout = $Forks::Super::CHILD_STDOUT{$pid};
$child_response = <$child_stdout>; # -or- = Forks::Super::read_stdout($pid);
if ($child_response eq "Yes\n") {
print $child_stdin "Good boy. You can have ice cream.\n";
} else {
print $child_stdin "Bad boy. No ice cream for you.\n";
sleep 2;
$child_err = Forks::Super::read_stderr($pid);
# -or- $child_err = readline($Forks::Super::CHILD_STDERR{$pid});
print $child_stdin "And no back talking!\n" if $child_err;
}
# ---------- manage jobs and system resources ---------------
# runs 100 tasks but the fork call blocks when there are already 5 jobs running
$Forks::Super::MAX_PROC = 5;
$Forks::Super::ON_BUSY = 'block';
for ($i=0; $i<100; $i++) {
$pid = fork { cmd => $task[$i] };
}
# jobs fail (without blocking) if the system is too busy
$Forks::Super::MAX_LOAD = 2.0;
$Forks::Super::ON_BUSY = 'fail';
$pid = fork { cmd => $task };
if ($pid > 0) { print "'$task' is running\n" }
elsif ($pid < 0) { print "current CPU load > 2.0 -- didn't start '$task'\n"; }
# $Forks::Super::MAX_PROC setting can be overridden. Start job immediately if < 3 jobs running
$pid = fork { sub => 'MyModule::MyMethod', args => [ @b ], max_proc => 3 };
# try to fork no matter how busy the system is
$pid = fork { force => 1, sub => \&MyMethod, args => [ @my_args ] };
# when system is busy, queue jobs. When system is not busy, some jobs on the queue will start.
# if job is queued, return value from fork() is a very negative number
$Forks::Super::ON_BUSY = 'queue';
$pid = fork { cmd => $command };
$pid = fork { cmd => $useless_command, queue_priority => -5 };
$pid = fork { cmd => $important_command, queue_priority => 5 };
$pid = fork { cmd => $future_job, delay => 20 } # keep job on queue for at least 20s
# assign descriptive names to tasks
$pid1 = fork { cmd => $command, name => "my task" };
$pid2 = waitpid "my task", 0;
# run callbacks at various points of job life-cycle
$pid = fork { cmd => $command, callback => \&on_complete };
$pid = fork { sub => $sub, callback => { start => 'on_start', finish => \&on_complete,
queue => sub { print "Job $_[1] queued.\n" } } };
# set up dependency relationships
$pid1 = fork { cmd => $job1 };
$pid2 = fork { cmd => $job2, depend_on => $pid1 }; # put on queue until job 1 is complete
$pid4 = fork { cmd => $job4, depend_start => [$pid2,$pid3] }; # put on queue until jobs 2,3 have started
$pid5 = fork { cmd => $job5, name => "group C" };
$pid6 = fork { cmd => $job6, name => "group C" };
$pid7 = fork { cmd => $job7, depend_on => "group C" }; # wait for jobs 5 & 6 to complete
# manage OS settings on jobs -- not available on all systems
$pid1 = fork { os_priority => 10 }; # like nice(1) on Un*x
$pid2 = fork { cpu_affinity => 0x5 }; # background task will prefer CPUs #0 and #2
# job information
$state = Forks::Super::state($pid); # 'ACTIVE', 'DEFERRED', 'COMPLETE', 'REAPED'
$status = Forks::Super::status($pid); # exit status for completed jobs
# --- evaluate long running expressions in the background
$result = bg_eval { a_long_running_calculation() };
# sometime later ...
print "Result was $$result\n";
@result = bg_qx( "./long_running_command" );
# ... do something else for a while and when you need the output ...
print "output of long running command was: @result\n";
DESCRIPTION
This package provides new definitions for the Perl functions fork
, wait
, and waitpid
with richer functionality. The new features are designed to make it more convenient to spawn background processes and more convenient to manage them and to get the most out of your system's resources.
$pid = fork( \%options )
The new fork
call attempts to spawn a new process. With no arguments, it behaves the same as the Perl fork() system call.
creating a new process running the same program at the same point
returning the process id (PID) of the child process to the parent.
On Windows, this is a pseudo-process ID
returning 0 to the child process
returning
undef
if the fork call was unsuccessful
Options for instructing the child process
The fork
call supports three options, cmd
, exec
, and sub
(or sub
/args
) that will instruct the child process to carry out a specific task. Using either of these options causes the child process not to return from the fork
call.
$child_pid = fork { cmd => $shell_command }
$child_pid = fork { cmd => \@shell_command }
-
On successful launch of the child process, runs the specified shell command in the child process with the Perl
system()
function. When the system call is complete, the child process exits with the same exit status that was returned by the system call.Returns the PID of the child process to the parent process. Does not return from the child process, so you do not need to check the fork() return value to determine whether code is executing in the parent or child process.
$child_pid = fork { exec => $shell_command }
$child_pid = fork { exec => \@shell_command }
-
Like the
cmd
option, but the background process launches the shell command withexec
instead of withsystem
.Using
exec
instead ofcmd
will spawn one fewer process, but note that thetimeout
andexpiration
options cannot be used with theexec
option (see "Options for simple job management").
$child_pid = fork { sub => $subroutineName [, args => \@args ] }
$child_pid = fork { sub => \&subroutineReference [, args => \@args ] }
$child_pid = fork { sub => sub { ... code ... } [, args => \@args ] }
-
On successful launch of the child process,
fork
invokes the specified Perl subroutine with the specified set of method arguments (if provided). If the subroutine completes normally, the child process exits with a status of zero. If the subroutine exits abnormally (i.e., if itdie
's, or if the subroutine invokesexit
with a non-zero argument), the child process exits with non-zero status.Returns the PID of the child process to the parent process. Does not return from the child process, so you do not need to check the fork() return value to determine whether code is running in the parent or child process.
If neither the
cmd
or thesub
option is provided to the fork call, then the fork() call behaves like a Perlfork()
call, returning the child PID to the parent and also returning zero to the child.
Options for simple job management
fork { timeout => $delay_in_seconds }
fork { expiration => $timestamp_in_seconds_since_epoch_time }
-
Puts a deadline on the child process and causes the child to
die
if it has not completed by the deadline. With thetimeout
option, you specify that the child process should not survive longer than the specified number of seconds. Withexpiration
, you are specifying an epoch time (like the one returned by thetime
function) as the child process's deadline.If the
setpgrp()
system call is implemented on your system, then this module will try to reset the process group ID of the child process. On timeout, the module will attempt to kill off all subprocesses of the expiring child process.If the deadline is some time in the past (if the timeout is not positive, or the expiration is earlier than the current time), then the child process will die immediately after it is created.
Note that this feature uses the Perl
alarm
call with a handler forSIGALRM
. If you use this feature and also specify asub
to invoke, and that subroutine also tries to use thealarm
feature or set a handler forSIGALRM
, the results will be undefined.The
timeout
andexpiration
options cannot be used with theexec
option, since the child process will not be able to generate aSIGALRM
after anexec
call. fork { delay => $delay_in_seconds }
fork { start_after => $timestamp_in_epoch_time }
-
Causes the child process to be spawned at some time in the future. The return value from a
fork
call that uses these features will not be a process id, but it will be a very negative number called a job ID. See the section on "Deferred processes" for information on what to do with a job ID.A deferred job will start no earlier than its appointed time in the future. Depending on what circumstances the queued jobs are examined, the actual start time of the job could be significantly later than the appointed time.
A job may have both a minimum start time (through
delay
orstart_after
options) and a maximum end time (throughtimeout
andexpiration
). Jobs with inconsistent times (end time is not later than start time) will be killed of as soon as they are created. fork { child_fh => $fh_spec }
fork { child_fh => [ @fh_spec ] }
-
Note: API change since v0.10.
Launches a child process and makes the child process's
STDIN
,STDOUT
, and/orSTDERR
filehandles available to the parent process in the scalar variables$Forks::Super::CHILD_STDIN{$pid}
,$Forks::Super::CHILD_STDOUT{$pid}
, and/or$Forks::Super::CHILD_STDERR{$pid}
, where$pid
is the PID return value from the fork call. This feature makes it possible, even convenient, for a parent process to communicate with a child, as this contrived example shows.$pid = fork { sub => \&pig_latinize, timeout => 10, child_fh => "all" }; # in the parent, $Forks::Super::CHILD_STDIN{$pid} is an *output* filehandle print {$Forks::Super::CHILD_STDIN{$pid}} "The blue jay flew away in May\n"; sleep 2; # give child time to start up and get ready for input # and $Forks::Super::CHILD_STDOUT{$pid} is an *input* handle $result = <{$Forks::Super::CHILD_STDOUT{$pid}}>; print "Pig Latin translator says: $result\n"; # ==> eThay ueblay ayjay ewflay awayay inay ayMay\n @errors = <{$Forks::Super::CHILD_STDERR{$pid}>; print "Pig Latin translator complains: @errors\n" if @errors > 0; sub pig_latinize { for (;;) { while (<STDIN>) { foreach my $word (split /\s+/) { if ($word =~ /^qu/i) { print substr($word,2) . substr($word,0,2) . "ay"; # STDOUT } elsif ($word =~ /^([b-df-hj-np-tv-z][b-df-hj-np-tv-xz]*)/i) { my $prefix = 1; $word =~ s/[b-df-hj-np-tv-z][b-df-hj-np-tv-xz]*//i; print $word . $prefix . "ay"; } elsif ($word =~ /^[aeiou]/i) { print $word . "ay"; } else { print STDERR "Didn't recognize this word: $word\n"; } print " "; } print "\n"; } } }
The set of filehandles to make available are specified either as a non-alphanumeric delimited string, or list reference. This spec may contain one or more of the words
in
,out
,err
,join
,all
, orsocket
.in
,out
, anderr
mean that the child's STDIN, STDOUT, and STDERR, respectively, will be available in the parent process through the filehandles in$Forks::Super::CHILD_STDIN{$pid}
,$Forks::Super::CHILD_STDOUT{$pid}
, and$Forks::Super::CHILD_STDERR{$pid}
, where$pid
is the child's process ID.all
is a convenient way to specifyin
,out
, anderr
.join
specifies that the child's STDOUT and STDERR will be returned through the same filehandle, specified as both$Forks::Super::CHILD_STDOUT{$pid}
and$Forks::Super::CHILD_STDERR{$pid}
.If
socket
is specified, then local sockets will be used to pass between parent and child instead of temporary files.
Socket handles vs. file handles vs. pipes
Here are some things to keep in mind when deciding whether to use sockets, pipes, or regular files for parent-child IPC:
Using regular files is implemented everywhere and is the most portable and robust scheme for IPC. Sockets and pipes are best suited for Unix-like systems, and may have limitations on non-Unix systems.
Sockets and pipes have a performance advantage, especially at child process start-up.
Temporary files use disk space; sockets and pipes use memory. One of these might be a relatively scarce resource on your system.
Socket input buffers have limited capacity. Write operations can block if the socket reader is not vigilant. Pipe input buffers are often even smaller (as small as 512 bytes on some modern systems).
The
system-limits
file that was created in your build directory will have information about the socket and pipe capacity of your system, if you are interested.On Windows, sockets and pipes are blocking, and care must be taken to prevent your script from reading on an empty socket
Socket and file handle gotchas
Some things to keep in mind when using socket or file handles to communicate with a child process.
care should be taken before
close
'ing a socket handle. The same socket handle can be used for both reading and writing. Don't close a handle when you are only done with one half of the socket operations.The test
defined getsockname($handle)
can determine whether$handle
is a socket handle or a regular filehandle. The test-p $handle
can determine whether$handle
is reading from or writing to a pipe.The following idiom is safe to use on both socket handles, pipes, and regular filehandles:
shutdown($handle,2) || close $handle;
IPC in this module is asynchronous. In general, you cannot tell whether the parent/child has written anything to be read in the child/parent. So getting
undef
when reading from the$Forks::Super::CHILD_STDOUT{$pid}
handle does not necessarily mean that the child has finished (or even started!) writing to its STDOUT. Check out theseek HANDLE,0,1
trick in the perlfunc documentation for seek about reading from a handle after you have already read past the end. You may find it useful for your parent and child processes to follow some convention (for example, a special word like"__END__"
) to denote the end of input.
fork { stdin => $input }
-
Provides the data in
$input
as the child process's standard input. Equivalent to, but a little more efficient than:$pid = fork { child_fh => "in", sub => sub { ... } }; print {$Forks::Super::CHILD_STDIN{$pid}} $input;
$input
may either be a scalar, a reference to a scalar, or a reference to an array. fork { stdout => \$output }
fork { stderr => \$errput }
-
On completion of the background process, loads the standard output and standard error of the child process into the given scalar references. If you do not need to use the child's output while the child is running, it could be more convenient to use this construction than calling
Forks::Super::read_stdout($pid)
(or<{$Forks::Super::CHILD_STDOUT{$pid}}>
) to obtain the child's output. fork { retries => $max_retries }
-
If the underlying system
fork
call fails (i.e., returnsundef
), pauses for a short time and retries up to$max_retries
times.This feature is probably not that useful, as a failed
fork
call usually indicates some bad system condition (too many processes, system out of memory or swap space, impending kernel panic, etc.). In such a case, your expectations of recovery should not be too high.
Options for complicated job management
The fork()
call from this module supports options that help to manage child processes or groups of child processes in ways to better manage your system's resources. For example, you may have a lot of tasks to perform in the background, but you don't want to overwhelm your (possibly shared) system by running them all at once. There are features to control how many, how, and when your jobs will run.
fork { name => $name }
-
Attaches a string identifier to the job. The identifier can be used for several purposes:
to obtain a Forks::Super::Job object representing the background task through the
Forks::Super::Job::get
orForks::Super::Job::getByName
methods.as the first argument to
waitpid
to wait on a job or jobs with specific namesto identify and establish dependencies between background tasks. See the
depend_on
anddepend_start
parameters below.if supported by your system, the name attribute will change the argument area used by the ps(1) program and change the way the background process is displaying in your process viewer. (See $PROGRAM_NAME in perlvar about overriding the special
$0
variable.)
$Forks::Super::MAX_PROC = $max_simultaneous_jobs
fork { max_fork => $max_simultaneous_jobs }
-
Specifies the maximum number of background processes that you want to run. If a
fork
call is attempted while there are already the maximum number of child processes running, then thefork()
call will either block (until some child processes complete), fail (return a negative value without spawning the child process), or queue the job (returning a very negative value called a job ID), according to the specified "on_busy" behavior (see the next item). See the "Deferred processes" section for information about how queued jobs are handled.On any individual
fork
call, the maximum number of processes may be overridden by also specifyingmax_proc
orforce
options.$Forks::Super::MAX_PROC = 8; # launch 2nd job only when system is very not busy $pid1 = fork { sub => 'method1' }; $pid2 = fork { sub => 'method2', max_proc => 1 }; $pid3 = fork { sub => 'method3' };
Setting
$Forks::Super::MAX_PROC
to zero or a negative number will disable the check for too many simultaneous processes. $Forks::Super::MAX_LOAD = $max_cpu_load
fork { max_load => $max_cpu_load }
-
Specifies a maximum CPU load threshold. The
fork
command will not spawn any new jobs while the current system CPU load is larger than this threshold. CPU load checks are disabled if this value is set to zero or to a negative number.Note that the metric of "CPU load" is different on different operating systems. On Windows (including Cygwin), the metric is CPU utilization, which is always a value between 0 and 1. On Unix-ish systems, the metric is the 1-minute system load average, which could be a value larger than 1. Also note that the 1-minute average load measurement has a lot of inertia -- after you start or stop a CPU intensive task, it will take at least several seconds for that change to have a large impact on the 1-minute utilization.
If your system does not have a well-behaved
uptime(1)
command, then you may need to install theSys::CpuLoadX
module to use this feature. For now, theSys::CpuLoadX
module is only available bundled withForks::Super
and otherwise cannot be downloaded from CPAN. $Forks::Super::ON_BUSY = "block" | "fail" | "queue"
fork { on_busy => "block" | "fail" | "queue" }
-
Dictates the behavior of
fork
in the event that the module is not allowed to launch the specified job for whatever reason.block
-
If the system cannot create a new child process for the specified job, it will wait and periodically retry to create the child process until it is successful. Unless a system fork call is attempted and fails,
fork
calls that use this behavior will return a positive PID. fail
-
If the system cannot create a new child process for the specified job, the
fork
call will immediately return with a small negative value. queue
-
If the system cannot create a new child process for the specified job, the job will be deferred, and an attempt will be made to launch the job at a later time. See "Deferred processes" below. The return value will be a very negative number (job ID).
On any individual
fork
call, the default launch failure behavior specified by$Forks::Super::ON_BUSY
can be overridden by specifying aon_busy
option:$Forks::Super::ON_BUSY = "fail"; $pid1 = fork { sub => 'myMethod' }; $pid2 = fork { sub => 'yourMethod', on_busy => "queue" }
fork { force => $bool }
-
If the
force
option is set, thefork
call will disregard the usual criteria for deciding whether a job can spawn a child process, and will always attempt to create the child process. fork { queue_priority => $priority }
-
In the event that a job cannot immediately create a child process and is put on the job queue (see "Deferred processes"), the C{queue_priority} specifies the relative priority of the job on the job queue. In general, eligible jobs with high priority values will be started before jobs with lower priority values.
fork { depend_on => $id }
fork { depend_on => [ $id_1, $id_2, ... ] }
fork { depend_start => $id }
fork { depend_start => [ $id_1, $id_2, ... ] }
-
Indicates a dependency relationship between the job in this
fork
call and one or more other jobs. The identifiers may be process/job IDs orname
attributes (ses above) from earlierfork
calls.If a
fork
call specifies adepend_on
option, then that job will be deferred until all of the child processes specified by the process or job IDs have completed. If afork
call specifies adepend_start
option, then that job will be deferred until all of the child processes specified by the process or job IDs have started.Invalid process and job IDs in a
depend_on
ordepend_start
setting will produce a warning message but will not prevent a job from starting.Dependencies are established at the time of the
fork
call and can only apply to jobs that are known at run time. So for example, in this code,$job1 = fork { cmd => $cmd, name => "job1", depend_on => "job2" }; $job2 = fork { cmd => $cmd, name => "job2", depend_on => "job1" };
at the time the first job is cereated, the job named "job2" has not been created yet, so the first job will not have a dependency (and a warning will be issued when the job is created). This may be a limitation but it also guarantees that there will be no circular dependencies.
When a dependency identifier is a name attribute that applies to multiple jobs, the job will be dependent on all existing jobs with that name:
# Job 3 will not start until BOTH job 1 and job 2 are done $job1 = fork { name => "Sally", ... }; $job2 = fork { name => "Sally", ... }; $job3 = fork { depend_on => "Sally", ... }; # all of these jobs have the same name and depend on ALL previous jobs $job4 = fork { name => "Ralph", depend_start => "Ralph", ... }; # no dependencies $job5 = fork { name => "Ralph", depend_start => "Ralph", ... }; # depends on Job 4 $job6 = fork { name => "Ralph", depend_start => "Ralph", ... }; # depends on #4 and #5
fork { can_launch => \&methodName }
fork { can_launch => sub { ... anonymous sub ... } }
-
Supply a user-specified function to determine when a job is eligible to be started. The function supplied should return 0 if a job is not eligible to start and non-zero if it is eligible to start.
During a
fork
call or when the job queue is being examined, the user'scan_launch
method will be invoked with a singleForks::Super::Job
argument containing information about the job to be launched. User code may make use of the default launch determination method by invoking the_can_launch
method of the job object:# Running on a BSD system with the uptime(1) call. # Want to block jobs when the current CPU load # (1 minute) is greater than 4 and respect all other criteria: fork { cmd => $my_command, can_launch => sub { $job = shift; # a Forks::Super::Job object return 0 if !$job->_can_launch; # default $cpu_load = (split /\s+/,`uptime`)[-3]; # get 1 minute avg CPU load return 0 if $cpu_load > 4.0; # system too busy. let's wait return 1; } }
fork { callback => $subroutineName }
fork { callback => sub { BLOCK } }
fork { callback => { start => ..., finish => ..., queue => ..., fail => ... } }
-
Install callbacks to be run when and if certain events in the life cycle of a background process occur. The first two forms of this option are equivalent to
fork { callback => { finish => ... } }
and specify code that will be executed when a background process is complete and the module has received its
SIGCHLD
event. Astart
callback is executed just after a new process is spawned. Aqueue
callback is run if the job is deferred for any reason (see "Deferred processes") and the job is placed onto the job queue for the first time. And thefail
callback is run if the job is not going to be launched (that is, a case where thefork
call would return-1
).Callbacks are invoked with two arguments: the
Forks::Super::Job
object that was created with the originalfork
call, and the job's ID (the return value fromfork
).You should keep your callback functions short and sweet, like you do for your signal handlers. Sometimes callbacks are invoked from the signal handler, and the processing of other signals could be delayed if the callback functions take too long to run.
fork { os_priority => $priority }
-
On supported operating systems, and after the successful creation of the child process, attempt to set the operating system priority of the child process, using your operating system's notion of what priority is.
On unsupported systems, this option is ignored.
fork { cpu_affinity => $bitmask }
-
On supported operating systems with multiple cores, and after the successful creation of the child process, attempt to set the child process's CPU affinity. Each bit of the bitmask represents one processor. Set a bit to 1 to allow the process to use the corresponding processor, and set it to 0 to disallow the corresponding processor. There may be additional restrictions on the valid range of values imposed by the operating system.
This feature requires the Sys::CpuAffinity module. The
Sys::CpuAffinity
module is bundled withForks::Super
, or it may be obtained from CPAN. fork { debug => $bool }
fork { undebug => $bool }
-
Overrides the value in
$Forks::Super::DEBUG
(see "MODULE VARIABLES") for this specific job. If specified, thedebug
parameter controls only whether the module will output debugging information related to the job created by thisfork
call.Normally, the debugging settings of the parent, including the job-specific settings, are inherited by child processes. If the
undebug
option is specified with a non-zero parameter value, then debugging will be disabled in the child process.
Deferred processes
Whenever some condition exists that prevents a fork()
call from immediately starting a new child process, an option is to defer the job. Deferred jobs are placed on a queue. At periodic intervals, in response to periodic events, or whenever you invoke the Forks::Super::run_queue
method in your code, the queue will be examined to see if any deferred jobs are eligible to be launched.
Job ID
When a fork()
call fails to spawn a child process but instead defers the job by adding it to the queue, the fork()
call will return a unique, large negative number called the job ID. The number will be negative and large enough (<= -100000) so that it can be distinguished from any possible PID, Windows pseudo-process ID, process group ID, or fork()
failure code.
Although the job ID is not the actual ID of a system process, it may be used like a PID as an argument to waitpid
, as a dependency specification in another fork
call's depend_on
or depend_start
option, or the other module methods used to retrieve job information (See "Obtaining job information" below). Once a deferred job has been started, it will be possible to obtain the actual PID (or on Windows, the actual psuedo-process ID) of the process running that job.
Job priority
Every job on the queue will have a priority value. A job's priority may be set explicitly by including the queue_priority
option in the fork()
call, or it will be assigned a default priority near zero. Every time the queue is examined, the queue will be sorted by this priority value and an attempt will be made to launch each job in this order. Note that different jobs may have different criteria for being launched, and it is possible that that an eligible low priority job may be started before an ineligible higher priority job.
Queue examination
Certain events in the SIGCHLD
handler or in the wait
, waitpid
, and/or waitall
methods will cause the list of deferred jobs to be evaluated and to start eligible jobs. But this configuration does not guarantee that the queue will be examined in a timely or frequent enough basis. The user may invoke the
Forks::Super::Queue::run_queue()
method at any time to force the queue to be examined.
Special tips for Windows systems
On POSIX systems (including Cygwin), programs using the Forks::Super
module are interrupted when a child process completes. A callback function performs some housekeeping and may perform other duties like trying to dispatch things from the list of deferred jobs.
Windows systems do not have the signal handling capabilities of other systems, and so other things equal, a script running on Windows will not perform the housekeeping tasks as frequently as a script on other systems.
The method Forks::Super::pause
can be used as a drop in replacement for the Perl sleep
call. In a pause
function call, the program will check on active child processes, reap the ones that have completed, and attempt to dispatch jobs on the queue.
Calling pause
with an argument of 0 is also a valid way of invoking the child handler function on Windows. When used this way, pause
returns immediately after running the child handler.
Child processes are implemented differently in Windows than in POSIX systems. The CORE::fork
and Forks::Super::fork
calls will usually return a pseudo-process ID to the parent process, and this will be a negative value. The Unix idiom of testing whether a fork
call returns a positive number needs to be modified on Windows systems by testing whether Forks::Super::isValidPid($pid)
returns true, where $pid
is the return value from a Forks::Super::fork
call.
OTHER FUNCTIONS
$reaped_pid = wait [$timeout]
-
Like the Perl wait system call, blocks until a child process terminates and returns the PID of the deceased process, or
-1
if there are no child processes remaining to reap. The exit status of the child is returned in$?
.This version of the
wait
call can take an optional$timeout
argument, which specifies the maximum length of time in seconds to wait for a process to complete. If a timeout is supplied and no process completes before the timeout expires, then thewait
function returns the value-1.5
(you can also test if the return value of the function is the same asForks::Super::TIMEOUT
, which is a constant to indicate that a wait call timed out). $reaped_pid = waitpid $pid, $flags [, $timeout]
-
Waits for a child with a particular PID or a child from a particular process group to terminate and returns the PID of the deceased process, or
-1
if there is no suitable child process to reap. If the return value contains a PID, then$?
is set to the exit status of that process.A valid job ID (see "Deferred processes") may be used as the $pid argument to this method. If the
waitpid
call reaps the process associated with the job ID, the return value will be the actual PID of the deceased child.Note that the
waitpid
function can wait on a job ID even when the job associated with that ID is still in the job queue, waiting to be started.A $pid value of
-1
waits for the first available child process to terminate and returns its PID.A $pid value of
0
waits for the first available child from the same process group of the calling process.A negative
$pid
that is not recognized as a valid job ID will be interpreted as a process group ID, and thewaitpid
function will return the PID of the first available child from the same process group.On some^H^H^H^H every modern system that I know about, a
$flags
value ofPOSIX::WNOHANG
is supported to perform a non-blocking wait. See the Perl waitpid documentation.If the optional
$timeout
argument is provided, thewaitall
function will block for at most$timeout
seconds, and return-1.5
(orForks::Super::TIMEOUT
if a suitable process is not reaped in that time. $count = waitall [$timeout]
-
Blocking wait for all child processes, including deferred jobs that have not started at the time of the
waitall
call. Return value is the number of processes that were waited on.If the optional
$timeout
argument is supplied, the function will block for at most$timeout
seconds before returning. $num_signalled = Forks::Super::kill $signal, @jobsOrPids
-
Send a signal to the background processes specified either by process IDs, job names, or
Forks::Super::Job
objects. Returns the number of jobs that were successfully signalled.This method "does what you mean" with respect to terminating, suspending, or resuming processes. In this way, jobs in the job queue (that don't even have a proper PID) may still be "signalled". On Windows systems, which do not have a Unix-like signals framework, this can be accomplished through the appropriate Windows API calls. It is highly recommended that you install the Win32::API module for this purpose.
On Windows, which does not have a Unix-like signals framework, this method will sometimes "do what you mean" with respect to suspending, resuming, and terminating processes through other Windows API calls. It is highly recommended that you install the Win32::API module for this purpose.
See also the Forks::Super::Job::suspend and resume methods. It is preferable (out of portability concerns) to use these methods
$job->suspend; $job->resume;
rather than
Forks::Super::kill
Forks::Super::kill 'STOP', $job; Forks::Super::kill 'CONT', $job;
$num_signalled = Forks::Super::kill_all $signal
-
Sends a "signal" (see expanded meaning of "signal" in "kill", above). to all relevant processes spawned from the
Forks::Super
module. Forks::Super::isValidPid( $pid )
-
Tests whether the return value of a
fork
call indicates that a background process was successfully created or not. On POSIX systems it is sufficient to check whether$pid
is a positive integer, butisValidPid
is a more Forks::Super::pause($delay)
-
A productive drop-in replacement for the Perl
sleep
system call (orTime::HiRes::sleep
, if available). On systems like Windows that lack a proper method for handlingSIGCHLD
events, theForks::Super::pause
method will occasionally reap child processes that have completed and attempt to dispatch jobs on the queue.On other systems, using
Forks::Super::pause
is less vulnerable thansleep
to interruptions from this module (See "BUGS AND LIMITATIONS" below). $status = Forks::Super::status($pid)
-
Returns the exit status of a completed child process represented by process ID, job ID, or
name
attribute. Aside from being a permanent store of the exit status of a job, using this method might be a more reliable indicator of a job's status than checking$?
after await
orwaitpid
call, because it is possible for this module'sSIGCHLD
handler to temporarily corrupt the$?
value while it is checking for deceased processes. $line = Forks::Super::read_stdout($pid)
@lines = Forks::Super::read_stdout($pid)
$line = Forks::Super::read_stderr($pid)
@lines = Forks::Super::read_stderr($pid)
-
For jobs that were started with the
child_fh => "out"
andchild_fh => "err"
options enabled, read data from the STDOUT and STDERR filehandles of child processes.Aside from the more readable syntax, these functions may be preferable to
@lines = < {$Forks::Super::CHILD_STDOUT{$pid}} >; $line = < {$Forks::Super::CHILD_STDERR{$pid}} >;
because they will automatically handle clearing the EOF condition on the filehandles if the parent is reading on the filehandles faster than the child is writing on them.
Functions work in both scalar and list context. If there is no data to read on the filehandle, but the child process is still active and could put more data on the filehandle, these functions return "" in scalar and list context. If there is no more data on the filehandle and the child process is finished, the functions return
undef
. Forks::Super::close_fh($pid)
-
Closes all open file handles and socket handles for interprocess communication with the specified child process. Most operating systems impose a hard limit on the number of filehandles that can be opened in a process simultaneously, so you should use this function when you are finished communicating with a child process so that you don't run into that limit.
Obtaining job information
$job = Forks::Super::Job::get($pid)
-
Returns a
Forks::Super::Job
object associated with process ID or job ID$pid
. See Forks::Super::Job for information about the methods and attributes of these objects. @jobs = Forks::Super::Job::getByName($name)
-
Returns zero of more
Forks::Super::Job
objects with the specified job names. A job receives a name if aname
parameter was provided in theForks::Super::fork
call. $state = Forks::Super::state($pid)
-
Returns the state of the job specified by the given process ID, job ID, or job name. See "state" in Forks::Super::Job.
$status = Forks::Super::status($pid)
-
Returns the exit status of the job specified by the given process ID, job ID, or job name. See "status" in Forks::Super::Job. This value will be undefined until the job is complete.
$reference = bg_eval { BLOCK }
$reference = bg_eval { BLOCK } { option => value, ... }
-
Evaluates the specified block of code in a background process. When the parent process dereferences the result, it uses interprocess communication to retrieve the result from the child process, waiting until the child finishes if necessary.
# Example 1: must wait until job finishes before $$result is available $result = bg_eval { sleep 3 ; return 42 }; print "Result is $$result\n"; # Example 2: $$result is probably available immediately $result = bg_eval { sleep 3 ; return 42 }; &do_something_that_takes_about_5_seconds(); print "Result is $$result\n";
The code block is always evaluated in scalar context, though it is acceptable to return a reference:
$result = bg_eval { @files = File::Find::find(\&criteria, @lots_of_dirs); return \@files; }; # ... do something else while that job runs ... foreach my $matching_file (@$$result) { # note double dereference # ... do something with $matching_file }
The background job will be spawned with the
Forks::Super::fork
call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed tobg_eval
that will be provided to thefork
call. For example:$result = bg_eval { return get_from_teh_Internet($something, $where); } { timeout => 60, priority => 3 };
will return a reference to
undef
if the operation takes longer than 60 seconds. Most valid options for thefork
call are also valid options forbg_eval
, including timeouts, delays, job dependencies, names, and callback. The only invalid options forbg_eval
arecmd
,sub
,exec
, andchild_fh
.A call to
bg_eval
will set the variables$Forks::Super::LAST_JOB
and$Forks::Super::LAST_JOB_ID
. See "MODULE VARIABLES" below. @result = bg_eval { BLOCK }
@result = bg_eval { BLOCK } { option => value, ... }
-
Evaluates the specified block of code in a background process and in list context. The parent process retrieves the result from the child through interprocess communication the first time that an element of the array is referenced; the parent will wait for the child to finish if necessary.
The background job will be spawned with the
Forks::Super::fork
call, and the command will block, fail, or defer a background job in accordance with all of the rules of this module. Additional options may be passed to thebg_eval
function that will be provided to theForks::Super::fork
call. For example:@result = bg_eval { count_words($a_huge_file) } { timeout => 60 };
will return an empty list if the operation takes longer than 60 seconds. Any valid options for the
fork
call are also valid options forbg_eval
, except forexec
,cmd
,sub
, andchild_fh
.A call to
bg_eval
will set the variables$Forks::Super::LAST_JOB
and$Forks::Super::LAST_JOB_ID
. See "MODULE VARIABLES" below. $reference = bg_qx $command
$reference = bg_qx $command, { option => value , ... }
-
Executes the specified shell command in a background process. When the parent process dereferences the result, it uses interprocess communication to retrieve the output from the child process, waiting until the child finishes if necessary. The deferenced value will contain the output from the command.
Think of this command as a background version of Perl's backticks or
qx()
function.The background job will be spawned with the
Forks::Super::fork
call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed tobg_eval
that will be provided to thefork
call. For example, this command$result = bg_qx "nslookup joe.schmoe.com", { timeout => 15 };
will run
nslookup
in a background process for up to 15 seconds. The expression$$result
will then contain all of the output produced by the process up until the time it was terminated. Most valid options for thefork
call are also valid options forbg_eval
, including timeouts, delays, job dependencies, names, and callback. The only invalid options forbg_eval
arecmd
,sub
,exec
, andchild_fh
.A call to
bg_qx
will set the variables$Forks::Super::LAST_JOB
and$Forks::Super::LAST_JOB_ID
. See "MODULE VARIABLES" below. @result = bg_qx $command
@result = bg_qx $command, { option => value , ... }
-
Like the scalar context form of the
bg_qx
command, but loads output of the specified command into an array, one element per line (as defined by the current record separator$/
). The command will run in a background process. The first time that an element of the array is accessed, the parent will retrieve the output of the command, waiting until the child finishes if necessary.Think of this command as a background version of Perl's backticks or
qx()
function.The background job will be spawned with the
Forks::Super::fork
call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed tobg_eval
that will be provided to thefork
call. For example, this command@result = bg_qx "ssh $remotehost who", { timeout => 15 };
will run in a background process for up to 15 seconds.
@result
will then contain all of the output produced by the process up until the time it was terminated. Most valid options for thefork
call are also valid options forbg_eval
, including timeouts, delays, job dependencies, names, and callback. The only invalid options forbg_eval
arecmd
,sub
,exec
, andchild_fh
.A call to
bg_qx
will set the variables$Forks::Super::LAST_JOB
and$Forks::Super::LAST_JOB_ID
. See "MODULE VARIABLES" below.
MODULE VARIABLES
Module variables may be initialized on the use Forks::Super
line
# set max simultaneous procs to 5, allow children to call CORE::fork()
use Forks::Super MAX_PROC => 5, CHILD_FORK_OK => -1;
or they may be set explicitly in the code:
$Forks::Super::ON_BUSY = 'queue';
$Forks::Super::FH_DIR = "/home/joe/temp-ipc-files";
Module variables that may be of interest include:
$Forks::Super::MAX_PROC
-
The maximum number of simultaneous background processes that can be spawned by
Forks::Super
. If afork
call is attempted while there are already at least this many active background processes, the behavior of thefork
call will be determined by the value in$Forks::Super::ON_BUSY
or by theon_busy
option passed to thefork
call.This value will be ignored during a
fork
call if theforce
option is passed tofork
with a non-zero value. The value might also not be respected if the user supplies a code reference in thecan_launch
option and the user-supplied code does not test whether there are already too many active proceeses. $Forks::Super::ON_BUSY = 'block' | 'fail' | 'queue'
-
Determines behavior of a
fork
call when the system is too busy to create another background process.If this value is set to
block
, thenfork
will wait until the system is no longer too busy and then launch the background process. The return value will be a normal process ID value (assuming there was no system error in creating a new process).If the value is set to
fail
, thefork
call will return immediately without launching the background process. The return value will be-1
. AForks::Super::Job
object will not be created.If the value is set to
queue
, then thefork
call will create a "deferred" job that will be queued and run at a later time. Also see thequeue_priority
option tofork
to set the urgency level of a job in case it is deferred. The return value will be a large and negative job ID.This value will be ignored in favor of an
on_busy
option supplied to thefork
call. $Forks::Super::CHILD_FORK_OK = -1 | 0 | +1
-
Spawning a child process from another child process with this module has its pitfalls, and this capability is disabled by default: you will get a warning message and the
fork()
call will fail if you try it.To override hits behavior, set
$Forks::Super::CHILD_FORK_OK
to a non-zero value. Setting it to a positive value will allow you to use all the functionality of this module from a child process (with the obvious caveat that you cannotwait
on the child process of a child process from the main process).Setting
$Forks::Super::CHILD_FORK_OK
to a negative value will disable the functionality of this module but will reenable the classic Perlfork()
system call from child processes. $Forks::Super::DEBUG, Forks::Super::DEBUG
-
To see the internal workings of the
Forks::Super
module, set$Forks::Super::DEBUG
to a non-zero value. Information messages will be written to theForks::Super::Debug::DEBUG_fh
filehandle. By defaultForks::Super::Debug::DEBUG_fh
is aliased toSTDERR
, but it may be reset by the module user at any time.Debugging behavior may be overridden for specific jobs if the
debug
orundebug
option is provided tofork
. %Forks::Super::CHILD_STDIN
%Forks::Super::CHILD_STDOUT
%Forks::Super::CHILD_STDERR
-
In jobs that request access to the child process filehandles, these hash arrays contain filehandles to the standard input and output streams of the child. The filehandles for particular jobs may be looked up in these tables by process ID or job ID for jobs that were deferred.
Remember that from the perspective of the parent process,
$Forks::Super::CHILD_STDIN{$pid}
is an output filehandle (what you print to this filehandle can be read in the child's STDIN), and$Forks::Super::CHILD_STDOUT{$pid}
and$Forks::Super::CHILD_STDERR{$pid}
are input filehandles (for reading what the child wrote to STDOUT and STDERR).As with any asynchronous communication scheme, you should be aware of how to clear the EOF condition on filehandles that are being simultaneously written to and read from by different processes. A scheme like this works on most systems:
# in parent, reading STDOUT of a child for (;;) { while (<{$Forks::Super::CHILD_STDOUT{$pid}}>) { print "Child $pid said: $_"; } # EOF reached, but child may write more to filehandle later. sleep 1; seek $Forks::Super::CHILD_STDOUT{$pid}, 0, 1; }
@Forks::Super::ALL_JOBS
%Forks::Super::ALL_JOBS
-
List of all
Forks::Super::Job
objects that were created fromfork()
calls, including deferred and failed jobs. Both process IDs and job IDs for jobs that were deferred at one time) can be used to look up Job objects in the%Forks::Super::ALL_JOBS
table. $Forks::Super::QUEUE_INTERRUPT
-
On systems with mostly-working signal frameworks, this module installs a signal handler the first time that a task is deferred. The signal that is trapped is defined in the variable
$Forks::Super::QUEUE_INTERRUPT
. The default value isUSR1
, and it may be overridden directly or set on module importuse Forks::Super QUEUE_INTERRUPT => 'TERM'; $Forks::Super::QUEUE_INTERRUPT = 'USR2';
You would only worry about resetting this variable if you (including other modules that you import) are making use of an existing
SIGUSR1
handler. Forks::Super::TIMEOUT
-
A possible return value from
wait
andwaitpid
functions when a timeout argument is supplied. The value indicating a timeout should not collide with any other possible value from those functions, and should be recognizable as not an actual process ID. $Forks::Super::LAST_JOB_ID
$Forks::Super::LAST_JOB
-
Calls to the
bg_eval
andbg_qx
functions launch a background process and set the variables$Forks::Super::LAST_JOB_ID
to the job's process ID and$Forks::Super::LAST_JOB
to the job's Forks::Super::Job object. These functions do not explicitly return the job id, so these variables provide a convenient way to query that state of the jobs launched by these functions.Some
bash
users will immediately recognize the parallels between these variables and the bash$!
variable, which captures the process id of the last job to be run in the background.
EXPORTS
This module always exports the fork
, wait
, waitpid
, and waitall
functions, overloading the Perl system calls with the same names. Mixing Forks::Super
calls with the similarly-named Perl calls sis strongly discouraged, but you can access the original system calls at CORE::fork
, CORE::wait
, etc.
Functions that can be exported to the caller's package include
Forks::Super::bg_eval
Forks::Super::bg_qx
Forks::Super::isValidPid
Forks::Super::pause
Forks::Super::read_stderr
Forks::Super::read_stdout
Module variables that can be exported are:
%Forks::Super::CHILD_STDIN
%Forks::Super::CHILD_STDOUT
%Forks::Super::CHILD_STDERR
The special tag :var
will export all three of these hash tables to the calling namespace.
The tag :all
will export all the functions and variables listed above.
The Forks::Super::kill
function cannot be exported for now, while I think through the implications of overloading yet another Perl system call.
DIAGNOSTICS
fork() not allowed in child process ...
Forks::Super::fork() call not allowed in child process ...
-
When the package variable
$Forks::Super::CHILD_FORK_OK
is zero, this package does not allow thefork()
method to be called from a child process. Set$Forks::Super::CHILD_FORK_OK
to change this behavior. quick timeout
-
A job was configured with a timeout/expiration time such that the deadline for the job occurred before the job was even launched. The job was killed immediately after it was spawned.
Job start/Job dependency <nnn> for job <nnn> is invalid. Ignoring.
-
A process id or job id that was specified as a
depend_on
ordepend_start
option did not correspond to a known job. Job <nnn> reaped before parent initialization.
-
A child process finished quickly and was reaped by the parent process
SIGCHLD
handler before the parent process could even finish initializing the job state. The state of the job in the parent process might be unavailable or corrupt for a short time, but eventually it should be all right. interprocess filehandles not available
could not open filehandle to provide child STDIN/STDOUT/STDERR
child was not able to detect STDIN file ... Child may not have any input to read.
could not open filehandle to write child STDIN
could not open filehandle to read child STDOUT/STDERR
-
Initialization of filehandles for a child process failed. The child process will continue, but it will be unable to receive input from the parent through the
$Forks::Super::CHILD_STDIN{pid}
filehandle, or pass output to the parent through the filehandles$Forks::Super::CHILD_STDOUT{PID}
AND$Forks::Super::CHILD_STDERR{pid}
. exec option used, timeout option ignored
-
A
fork
call was made using the incompatible optionsexec
andtimeout
.
INCOMPATIBILITIES
This module requires its own SIGCHLD
handler, and is incompatible with any module that tries to install another SIGCHLD
handler. In particular, if you are used to setting
$SIG{CHLD} = 'IGNORE'
in your code, cut it out.
Some features use the alarm
function and custom SIGALRM
handlers in the child processes. Using other modules that employ this functionality may cause undefined behavior. Systems and versions that do not implement the alarm
function (like MSWin32 prior to Perl v5.7) will not be able to use these features.
The first time that a task is deferred, by default this module will try to install a SIGUSR1
handler. See the description of $Forks::Super::QUEUE_INTERRUPT
under "MODULE VARIABLES" for changing this behavior if you intended to use a SIGUSR1
handler for something else.
DEPENDENCIES
The bg_eval
function requires either YAML or JSON. If neither module is available, then using bg_eval
will result in the script croak
'ing.
Otherwise, there are no hard dependencies on non-core modules. Some features, especially operating-system specific functions, depend on some modules (Win32::API and Win32::Process for Wintel systems, for example), but the module will compile without those modules. Attempts to use these features without the required modules will be silently ignored.
BUGS AND LIMITATIONS
Leftover temporary files and directories
In programs that use the interprocess communication features, the module does not always do a good job of cleaning up after itself. You may find directories called .fhfork<nnn>
that may or not be empty scattered around your filesystem.
Interrupted system calls
A typical script using this module will have a lot of behind-the-scenes signal handling as child processes finish and are reaped. These frequent interruptions can affect the execution of your program. For example, in this script:
1: use Forks::Super;
2: fork(sub => sub { sleep 2 });
3: sleep 5;
4: # ... program continues ...
the sleep
call in line 3 is probably going to get interrupted before 5 seconds have elapsed as the end of the child process spawned in line 2 will interrupt execution and invoke the SIGCHLD handler. In some cases there are tedious workarounds:
3a: $stop_sleeping_at = time + 5;
3b: sleep 1 while time < $stop_sleeping_at;
It should be noted that signal handling in Perl is much improved with version 5.7.3, and the problems caused by such interruptions are much more tractable than they used to be.
The pause
call itself has the limitation that it may sleep for longer than the desired time. This is because the "productive" code executed in a pause
function call can take an arbitrarily long time to run.
Idiosyncratic behavior on some systems
The system implementation of fork'ing and wait'ing varies from platform to platform. This module has been extensively tested on Cygwin, Windows, and Linux, but less so on other systems. It is possible that some features will not work as advertised. Please report any problems you encounter to <mob@cpan.org> and I'll see what I can do about it.
Segfaults during cleanup
On some systems, it has been observed that an application using the Forks::Super
module may run normally, but might produce a segmentation fault or other error during cleanup. This will cause the application to exit with a non-zero exit code, even when the code accomplished everything it was supposed to. The cause and solution of these errors is an area of ongoing research.
SEE ALSO
There are reams of other modules on CPAN for managing background processes. See Parallel::*, Proc::Parallel, Proc::Fork, Proc::Launcher. Also Win32::Job.
Inspiration for bg_eval
function from Acme::Fork::Lazy.
AUTHOR
Marty O'Brien, <mob@cpan.org>
LICENSE AND COPYRIGHT
Copyright (c) 2009-2010, Marty O'Brien.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.