NAME

Forks::Super - extensions and convenience methods for managing background processes.

VERSION

Version 0.29

SYNOPSIS

use Forks::Super;
use Forks::Super MAX_PROC => 5, DEBUG => 1;

# familiar use - parent returns PID>0, child returns zero
$pid = fork();
die "fork failed" unless defined $pid;
if ($pid > 0) {
    # parent code
} else {
    # child code
}

# wait for a child process to finish
$w = wait;                  # blocking wait on any child, $? holds child exit status
$w = waitpid $pid, 0;       # blocking wait on specific child
$w = waitpid $pid, WNOHANG; # non-blocking wait, use with POSIX ':sys_wait_h'
$w = waitpid 0, $flag;      # wait on any process in current process group
waitall;                    # block until all children are finished

# -------------- helpful extensions ---------------------
# fork directly to a shell command. Child doesn't return.
$pid = fork { cmd => "./myScript 17 24 $n" };
$pid = fork { exec => [ "/bin/prog" , $file, "-x", 13 ] };

# fork directly to a Perl subroutine. Child doesn't return.
$pid = fork { sub => $methodNameOrRef , args => [ @methodArguments ] };
$pid = fork { sub => \&subroutine, args => [ @args ] };
$pid = fork { sub => sub { "anonymous sub" }, args => [ @args ] );

# put a time limit on the child process
$pid = fork { cmd => $command, timeout => 30 };            # kill child if not done in 30s
$pid = fork { sub => $subRef , expiration => 1260000000 }; # complete by 8AM Dec 5, 2009 UTC

# obtain standard filehandles for the child process
$pid = fork { child_fh => "in,out,err" };
if ($pid == 0) {      # child process
  sleep 1;
  $x = <STDIN>; # read from parent's $Forks::Super::CHILD_STDIN{$pid}
  print rand() > 0.5 ? "Yes\n" : "No\n" if $x eq "Clean your room\n";
  sleep 2;
  $i_can_haz_ice_cream = <STDIN>;
  if ($i_can_haz_ice_cream !~ /you can have ice cream/ && rand() < 0.5) {
      print STDERR '@#$&#$*&#$*&',"\n";
  }
  exit 0;
} # else parent process
$child_stdin = $Forks::Super::CHILD_STDIN{$pid};
print $child_stdin "Clean your room\n";
sleep 2;
$child_stdout = $Forks::Super::CHILD_STDOUT{$pid};
$child_response = <$child_stdout>; # -or-  = Forks::Super::read_stdout($pid);
if ($child_response eq "Yes\n") {
    print $child_stdin "Good boy. You can have ice cream.\n";
} else {
    print $child_stdin "Bad boy. No ice cream for you.\n";
    sleep 2;
    $child_err = Forks::Super::read_stderr($pid);
    # -or-  $child_err = readline($Forks::Super::CHILD_STDERR{$pid});
    print $child_stdin "And no back talking!\n" if $child_err;
}

# ---------- manage jobs and system resources ---------------
# runs 100 tasks but the fork call blocks when there are already 5 jobs running
$Forks::Super::MAX_PROC = 5;
$Forks::Super::ON_BUSY = 'block';
for ($i=0; $i<100; $i++) {
  $pid = fork { cmd => $task[$i] };
}

# jobs fail (without blocking) if the system is too busy
$Forks::Super::MAX_PROC = 5;
$Forks::Super::ON_BUSY = 'fail';
$pid = fork { cmd => $task };
if    ($pid > 0) { print "'$task' is running\n" }
elsif ($pid < 0) { print "5 or more jobs running -- didn't start '$task'\n"; }

# $Forks::Super::MAX_PROC setting can be overridden. Start job immediately if < 3 jobs running
$pid = fork { sub => 'MyModule::MyMethod', args => [ @b ], max_proc => 3 };

# try to fork no matter how busy the system is
$pid = fork { force => 1, sub => \&MyMethod, args => [ @my_args ] };

# when system is busy, queue jobs. When system is not busy, some jobs on the queue will start.
# if job is queued, return value from fork() is a very negative number
$Forks::Super::ON_BUSY = 'queue';
$pid = fork { cmd => $command };
$pid = fork { cmd => $useless_command, queue_priority => -5 };
$pid = fork { cmd => $important_command, queue_priority => 5 };
$pid = fork { cmd => $future_job, delay => 20 }   # keep job on queue for at least 20s

# assign descriptive names to tasks
$pid1 = fork { cmd => $command, name => "my task" };
$pid2 = waitpid "my task", 0;

# run callbacks at various points of job life-cycle
$pid = fork { cmd => $command, callback => \&on_complete };
$pid = fork { sub => $sub, callback => { start => 'on_start', finish => \&on_complete,
                                         queue => sub { print "Job $_[1] queued.\n" } } };

# set up dependency relationships
$pid1 = fork { cmd => $job1 };
$pid2 = fork { cmd => $job2, depend_on => $pid1 };            # put on queue until job 1 is complete
$pid4 = fork { cmd => $job4, depend_start => [$pid2,$pid3] }; # put on queue until jobs 2,3 have started

$pid5 = fork { cmd => $job5, name => "group C" };
$pid6 = fork { cmd => $job6, name => "group C" };
$pid7 = fork { cmd => $job7, depend_on => "group C" }; # wait for jobs 5 & 6 to complete

# manage OS settings on jobs -- not available on all systems
$pid1 = fork { os_priority => 10 };    # like nice(1) on Un*x
$pid2 = fork { cpu_affinity => 0x5 };  # background task will prefer CPUs #0 and #2

# job information
$state = Forks::Super::state($pid);    # 'ACTIVE', 'DEFERRED', 'COMPLETE', 'REAPED'
$status = Forks::Super::status($pid);  # exit status for completed jobs

# --- evaluate long running expressions in the background
$result = bg_eval { a_long_running_calculation() };
# sometime later ...
print "Result was $$result\n";

@result = bg_qx( "./long_running_command" );
# ... do something else for a while and when you need the output ...
print "output of long running command was: @result\n";

DESCRIPTION

This package provides new definitions for the Perl functions fork, wait, and waitpid with richer functionality. The new features are designed to make it more convenient to spawn background processes and more convenient to manage them and to get the most out of your system's resources.

$pid = fork( \%options )

The new fork call attempts to spawn a new process. With no arguments, it behaves the same as the Perl system call fork():

  • creating a new process running the same program at the same point

  • returning the process id (PID) of the child process to the parent.

    On Windows, this is a pseudo-process ID

  • returning 0 to the child process

  • returning undef if the fork call was unsuccessful

Options for instructing the child process

The fork call supports three options, cmd, exec, and sub (or sub/args) that will instruct the child process to carry out a specific task. Using either of these options causes the child process not to return from the fork call.

$child_pid = fork { cmd => $shell_command }
$child_pid = fork { cmd => \@shell_command }

On successful launch of the child process, runs the specified shell command in the child process with the Perl system() function. When the system call is complete, the child process exits with the same exit status that was returned by the system call.

Returns the PID of the child process to the parent process. Does not return from the child process, so you do not need to check the fork() return value to determine whether code is executing in the parent or child process.

$child_pid = fork { exec => $shell_command }
$child_pid = fork { exec => \@shell_command }

Like the cmd option, but the background process launches the shell command with exec instead of with system.

Using exec instead of cmd will spawn one fewer process, but note that the timeout and expiration options cannot be used with the exec option (see "Options for simple job management").

$child_pid = fork { sub => $subroutineName [, args => \@args ] }
$child_pid = fork { sub => \&subroutineReference [, args => \@args ] }
$child_pid = fork { sub => sub { ... code ... } [, args => \@args ] }

On successful launch of the child process, fork invokes the specified Perl subroutine with the specified set of method arguments (if provided). If the subroutine completes normally, the child process exits with a status of zero. If the subroutine exits abnormally (i.e., if it die's, or if the subroutine invokes exit with a non-zero argument), the child process exits with non-zero status.

Returns the PID of the child process to the parent process. Does not return from the child process, so you do not need to check the fork() return value to determine whether code is running in the parent or child process.

If neither the cmd or the sub option is provided to the fork call, then the fork() call behaves like a Perl fork() call, returning the child PID to the parent and also returning zero to the child.

Options for simple job management

fork { timeout => $delay_in_seconds }
fork { expiration => $timestamp_in_seconds_since_epoch_time }

Puts a deadline on the child process and causes the child to die if it has not completed by the deadline. With the timeout option, you specify that the child process should not survive longer than the specified number of seconds. With expiration, you are specifying an epoch time (like the one returned by the time function) as the child process's deadline.

If the setpgrp() system call is implemented on your system, then this module will try to reset the process group ID of the child process. On timeout, the module will attempt to kill off all subprocesses of the expiring child process.

If the deadline is some time in the past (if the timeout is not positive, or the expiration is earlier than the current time), then the child process will die immediately after it is created.

Note that this feature uses the Perl alarm call with a handler for SIGALRM. If you use this feature and also specify a sub to invoke, and that subroutine also tries to use the alarm feature or set a handler for SIGALRM, the results will be undefined.

The timeout and expiration options cannot be used with the exec option, since the child process will not be able to generate a SIGALRM after an exec call.

fork { delay => $delay_in_seconds }
fork { start_after => $timestamp_in_epoch_time }

Causes the child process to be spawned at some time in the future. The return value from a fork call that uses these features will not be a process id, but it will be a very negative number called a job ID. See the section on "Deferred processes" for information on what to do with a job ID.

A deferred job will start no earlier than its appointed time in the future. Depending on what circumstances the queued jobs are examined, the actual start time of the job could be significantly later than the appointed time.

A job may have both a minimum start time (through delay or start_after options) and a maximum end time (through timeout and expiration). Jobs with inconsistent times (end time is not later than start time) will be killed of as soon as they are created.

fork { child_fh => $fh_spec }
fork { child_fh => [ @fh_spec ] }

Note: API change since v0.10.

Launches a child process and makes the child process's STDIN, STDOUT, and/or STDERR filehandles available to the parent process in the scalar variables $Forks::Super::CHILD_STDIN{$pid}, $Forks::Super::CHILD_STDOUT{$pid}, and/or $Forks::Super::CHILD_STDERR{$pid}, where $pid is the PID return value from the fork call. This feature makes it possible, even convenient, for a parent process to communicate with a child, as this contrived example shows.

    $pid = fork { sub => \&pig_latinize, timeout => 10,
                  child_fh => "all" };

    # in the parent, $Forks::Super::CHILD_STDIN{$pid} is an *output* filehandle
    print {$Forks::Super::CHILD_STDIN{$pid}} "The blue jay flew away in May\n";

    sleep 2; # give child time to start up and get ready for input

    # and $Forks::Super::CHILD_STDOUT{$pid} is an *input* handle
    $result = <{$Forks::Super::CHILD_STDOUT{$pid}}>;
    print "Pig Latin translator says: $result\n"; # ==> eThay ueblay ayjay ewflay awayay inay ayMay\n
    @errors = <{$Forks::Super::CHILD_STDERR{$pid}>;
    print "Pig Latin translator complains: @errors\n" if @errors > 0;

    sub pig_latinize {
      for (;;) {
        while (<STDIN>) {
	  foreach my $word (split /\s+/) {
            if ($word =~ /^qu/i) {
              print substr($word,2) . substr($word,0,2) . "ay";  # STDOUT
            } elsif ($word =~ /^([b-df-hj-np-tv-z][b-df-hj-np-tv-xz]*)/i) {
              my $prefix = 1;
              $word =~ s/[b-df-hj-np-tv-z][b-df-hj-np-tv-xz]*//i;
	      print $word . $prefix . "ay";
	    } elsif ($word =~ /^[aeiou]/i) {
              print $word . "ay";
            } else {
	      print STDERR "Didn't recognize this word: $word\n";
            }
            print " ";
          }
	  print "\n";
        }
      }
    }

The set of filehandles to make available are specified either as a non-alphanumeric delimited string, or list reference. This spec may contain one or more of the words in, out, err, join, all, or socket.

in, out, and err mean that the child's STDIN, STDOUT, and STDERR, respectively, will be available in the parent process through the filehandles in $Forks::Super::CHILD_STDIN{$pid}, $Forks::Super::CHILD_STDOUT{$pid}, and $Forks::Super::CHILD_STDERR{$pid}, where $pid is the child's process ID. all is a convenient way to specify in, out, and err. join specifies that the child's STDOUT and STDERR will be returned through the same filehandle, specified as both $Forks::Super::CHILD_STDOUT{$pid} and $Forks::Super::CHILD_STDERR{$pid}.

If socket is specified, then local sockets will be used to pass between parent and child instead of temporary files.

Socket handles vs. file handles vs. pipes

Here are some things to keep in mind when deciding whether to use sockets, pipes, or regular files for parent-child IPC:

  • Using regular files is implemented everywhere and is the most portable and robust scheme for IPC. Sockets and pipes are best suited for Unix-like systems, and may have limitations on non-Unix systems.

  • Sockets and pipes have a performance advantage, especially at child process start-up.

  • Temporary files use disk space; sockets and pipes use memory. One of these might be a relatively scarce resource on your system.

  • Socket input buffers have limited capacity. Write operations can block if the socket reader is not vigilant. Pipe input buffers are often even smaller (as small as 512 bytes on some modern systems).

    The system-limits file that was created in your build directory will have information about the socket and pipe capacity of your system, if you are interested.

  • On Windows, sockets and pipes are blocking, and care must be taken to prevent your script from reading on an empty socket

Socket and file handle gotchas

Some things to keep in mind when using socket or file handles to communicate with a child process.

  • care should be taken before close'ing a socket handle. The same socket handle can be used for both reading and writing. Don't close a handle when you are only done with one half of the socket operations.

  • The test defined getsockname($handle) can determine whether $handle is a socket handle or a regular filehandle. The test -p $handle can determine whether $handle is reading from or writing to a pipe.

  • The following idiom is safe to use on both socket handles, pipes, and regular filehandles:

    shutdown($handle,2) || close $handle;
  • IPC in this module is asynchronous. In general, you cannot tell whether the parent/child has written anything to be read in the child/parent. So getting undef when reading from the $Forks::Super::CHILD_STDOUT{$pid} handle does not necessarily mean that the child has finished (or even started!) writing to its STDOUT. Check out the seek HANDLE,0,1 trick in the perlfunc documentation for seek about reading from a handle after you have already read past the end. You may find it useful for your parent and child processes to follow some convention (for example, a special word like "__END__") to denote the end of input.

fork { stdin => $input }

Provides the data in $input as the child process's standard input. Equivalent to, but a little more efficient than:

$pid = fork { child_fh => "in", sub => sub { ... } };
print {$Forks::Super::CHILD_STDIN{$pid}} $input;

$input may either be a scalar, a reference to a scalar, or a reference to an array.

fork { stdout => \$output }
fork { stderr => \$errput }

On completion of the background process, loads the standard output and standard error of the child process into the given scalar references. If you do not need to use the child's output while the child is running, it could be more convenient to use this construction than calling Forks::Super::read_stdout($pid) (or <{$Forks::Super::CHILD_STDOUT{$pid}}>) to obtain the child's output.

fork { retries => $max_retries }

If the underlying system fork call fails (i.e., returns undef), pauses for a short time and retries up to $max_retries times.

This feature is probably not that useful, as a failed fork call usually indicates some bad system condition (too many processes, system out of memory or swap space, impending kernel panic, etc.). In such a case, your expectations of recovery should not be too high.

Options for complicated job management

The fork() call from this module supports options that help to manage child processes or groups of child processes in ways to better manage your system's resources. For example, you may have a lot of tasks to perform in the background, but you don't want to overwhelm your (possibly shared) system by running them all at once. There are features to control how many, how, and when your jobs will run.

fork { name => $name }

Attaches a string identifier to the job. The identifier can be used for several purposes:

  • to obtain a Forks::Super::Job object representing the background task through the Forks::Super::Job::get or Forks::Super::Job::getByName methods.

  • as the first argument to waitpid to wait on a job or jobs with specific names

  • to identify and establish dependencies between background tasks. See the depend_on and depend_start parameters below.

  • if supported by your system, the name attribute will change the argument area used by the ps(1) program and change the way the background process is displaying in your process viewer. (See $PROGRAM_NAME in perlvar about overriding the special $0 variable.)

$Forks::Super::MAX_PROC = $max_simultaneous_jobs
fork { max_fork => $max_simultaneous_jobs }

Specifies the maximum number of background processes that you want to run. If a fork call is attempted while there are already the maximum number of child processes running, then the fork() call will either block (until some child processes complete), fail (return a negative value without spawning the child process), or queue the job (returning a very negative value called a job ID), according to the specified "on_busy" behavior (see the next item). See the "Deferred processes" section for information about how queued jobs are handled.

On any individual fork call, the maximum number of processes may be overridden by also specifying max_proc or force options.

$Forks::Super::MAX_PROC = 8;
# launch 2nd job only when system is very not busy
$pid1 = fork { sub => 'method1' };
$pid2 = fork { sub => 'method2', max_proc => 1 };
$pid3 = fork { sub => 'method3' };

Setting $Forks::Super::MAX_PROC to zero or a negative number will disable the check for too many simultaneous processes.

$Forks::Super::ON_BUSY = "block" | "fail" | "queue"
fork { on_busy => "block" | "fail" | "queue" }

Dictates the behavior of fork in the event that the module is not allowed to launch the specified job for whatever reason.

block

If the system cannot create a new child process for the specified job, it will wait and periodically retry to create the child process until it is successful. Unless a system fork call is attempted and fails, fork calls that use this behavior will return a positive PID.

fail

If the system cannot create a new child process for the specified job, the fork call will immediately return with a small negative value.

queue

If the system cannot create a new child process for the specified job, the job will be deferred, and an attempt will be made to launch the job at a later time. See "Deferred processes" below. The return value will be a very negative number (job ID).

On any individual fork call, the default launch failure behavior specified by $Forks::Super::ON_BUSY can be overridden by specifying a on_busy option:

$Forks::Super::ON_BUSY = "fail";
$pid1 = fork { sub => 'myMethod' };
$pid2 = fork { sub => 'yourMethod', on_busy => "queue" }
fork { force => $bool }

If the force option is set, the fork call will disregard the usual criteria for deciding whether a job can spawn a child process, and will always attempt to create the child process.

fork { queue_priority => $priority }

In the event that a job cannot immediately create a child process and is put on the job queue (see "Deferred processes"), the C{queue_priority} specifies the relative priority of the job on the job queue. In general, eligible jobs with high priority values will be started before jobs with lower priority values.

fork { depend_on => $id }
fork { depend_on => [ $id_1, $id_2, ... ] }
fork { depend_start => $id }
fork { depend_start => [ $id_1, $id_2, ... ] }

Indicates a dependency relationship between the job in this fork call and one or more other jobs. The identifiers may be process/job IDs or name attributes (ses above) from earlier fork calls.

If a fork call specifies a depend_on option, then that job will be deferred until all of the child processes specified by the process or job IDs have completed. If a fork call specifies a depend_start option, then that job will be deferred until all of the child processes specified by the process or job IDs have started.

Invalid process and job IDs in a depend_on or depend_start setting will produce a warning message but will not prevent a job from starting.

Dependencies are established at the time of the fork call and can only apply to jobs that are known at run time. So for example, in this code,

$job1 = fork { cmd => $cmd, name => "job1", depend_on => "job2" };
$job2 = fork { cmd => $cmd, name => "job2", depend_on => "job1" };

at the time the first job is cereated, the job named "job2" has not been created yet, so the first job will not have a dependency (and a warning will be issued when the job is created). This may be a limitation but it also guarantees that there will be no circular dependencies.

When a dependency identifier is a name attribute that applies to multiple jobs, the job will be dependent on all existing jobs with that name:

# Job 3 will not start until BOTH job 1 and job 2 are done
$job1 = fork { name => "Sally", ... };
$job2 = fork { name => "Sally", ... };
$job3 = fork { depend_on => "Sally", ... };

# all of these jobs have the same name and depend on ALL previous jobs
$job4 = fork { name => "Ralph", depend_start => "Ralph", ... }; # no dependencies
$job5 = fork { name => "Ralph", depend_start => "Ralph", ... }; # depends on Job 4
$job6 = fork { name => "Ralph", depend_start => "Ralph", ... }; # depends on #4 and #5
fork { can_launch => \&methodName }
fork { can_launch => sub { ... anonymous sub ... } }

Supply a user-specified function to determine when a job is eligible to be started. The function supplied should return 0 if a job is not eligible to start and non-zero if it is eligible to start.

During a fork call or when the job queue is being examined, the user's can_launch method will be invoked with a single Forks::Super::Job argument containing information about the job to be launched. User code may make use of the default launch determination method by invoking the _can_launch method of the job object:

# Running on a BSD system with the uptime(1) call.
# Want to block jobs when the current CPU load
# (1 minute) is greater than 4 and respect all other criteria:
fork { cmd => $my_command,
       can_launch => sub {
         $job = shift;                    # a Forks::Super::Job object
         return 0 if !$job->_can_launch;  # default
         $cpu_load = (split /\s+/,`uptime`)[-3]; # get 1 minute avg CPU load
         return 0 if $cpu_load > 4.0;     # system too busy. let's wait
         return 1;
       } }
fork { callback => $subroutineName }
fork { callback => sub { BLOCK } }
fork { callback => { start => ..., finish => ..., queue => ..., fail => ... } }

Install callbacks to be run when and if certain events in the life cycle of a background process occur. The first two forms of this option are equivalent to

fork { callback => { finish => ... } }

and specify code that will be executed when a background process is complete and the module has received its SIGCHLD event. A start callback is executed just after a new process is spawned. A queue callback is run if the job is deferred for any reason (see "Deferred processes") and the job is placed onto the job queue for the first time. And the fail callback is run if the job is not going to be launched (that is, a case where the fork call would return -1).

Callbacks are invoked with two arguments: the Forks::Super::Job object that was created with the original fork call, and the job's ID (the return value from fork).

You should keep your callback functions short and sweet, like you do for your signal handlers. Sometimes callbacks are invoked from the signal handler, and the processing of other signals could be delayed if the callback functions take too long to run.

fork { os_priority => $priority }

On supported operating systems, and after the successful creation of the child process, attempt to set the operating system priority of the child process.

On unsupported systems, this option is ignored.

fork { cpu_affinity => $bitmask }

On supported operating systems with multiple cores, and after the successful creation of the child process, attempt to set the child process's CPU affinity. Each bit of the bitmask represents one processor. Set a bit to 1 to allow the process to use the corresponding processor, and set it to 0 to disallow the corresponding processor. There may be additional restrictions on the valid range of values imposed by the operating system.

As of version 0.07, supported systems are Cygwin, Win32, Linux (on systems with taskset(1)), and possibly BSD.

fork { debug => $bool }
fork { undebug => $bool }

Overrides the value in $Forks::Super::DEBUG (see "MODULE VARIABLES") for this specific job. If specified, the debug parameter controls only whether the module will output debugging information related to the job created by this fork call.

Normally, the debugging settings of the parent, including the job-specific settings, are inherited by child processes. If the undebug option is specified with a non-zero parameter value, then debugging will be disabled in the child process.

Deferred processes

Whenever some condition exists that prevents a fork() call from immediately starting a new child process, an option is to defer the job. Deferred jobs are placed on a queue. At periodic intervals, in response to periodic events, or whenever you invoke the Forks::Super::run_queue method in your code, the queue will be examined to see if any deferred jobs are eligible to be launched.

Job ID

When a fork() call fails to spawn a child process but instead defers the job by adding it to the queue, the fork() call will return a unique, large negative number called the job ID. The number will be negative and large enough (<= -100000) so that it can be distinguished from any possible PID, Windows pseudo-process ID, process group ID, or fork() failure code.

Although the job ID is not the actual ID of a system process, it may be used like a PID as an argument to waitpid, as a dependency specification in another fork call's depend_on or depend_start option, or the other module methods used to retrieve job information (See "Obtaining job information" below). Once a deferred job has been started, it will be possible to obtain the actual PID (or on Windows, the actual psuedo-process ID) of the process running that job.

Job priority

Every job on the queue will have a priority value. A job's priority may be set explicitly by including the queue_priority option in the fork() call, or it will be assigned a default priority near zero. Every time the queue is examined, the queue will be sorted by this priority value and an attempt will be made to launch each job in this order. Note that different jobs may have different criteria for being launched, and it is possible that that an eligible low priority job may be started before an ineligible higher priority job.

Queue examination

Certain events in the SIGCHLD handler or in the wait, waitpid, and/or waitall methods will cause the list of deferred jobs to be evaluated and to start eligible jobs. But this configuration does not guarantee that the queue will be examined in a timely or frequent enough basis. The user may invoke the

Forks::Super::Queue::run_queue()

method at any time to force the queue to be examined.

Special tips for Windows systems

On POSIX systems (including Cygwin), programs using the Forks::Super module are interrupted when a child process completes. A callback function performs some housekeeping and may perform other duties like trying to dispatch things from the list of deferred jobs.

Windows systems do not have the signal handling capabilities of other systems, and so other things equal, a script running on Windows will not perform the housekeeping tasks as frequently as a script on other systems.

The method Forks::Super::pause can be used as a drop in replacement for the Perl sleep call. In a pause function call, the program will check on active child processes, reap the ones that have completed, and attempt to dispatch jobs on the queue.

Calling pause with an argument of 0 is also a valid way of invoking the child handler function on Windows. When used this way, pause returns immediately after running the child handler.

Child processes are implemented differently in Windows than in POSIX systems. The CORE::fork and Forks::Super::fork calls will usually return a pseudo-process ID to the parent process, and this will be a negative value. The Unix idiom of testing whether a fork call returns a positive number needs to be modified on Windows systems by testing whether Forks::Super::isValidPid($pid) returns true, where $pid is the return value from a Forks::Super::fork call.

OTHER FUNCTIONS

$reaped_pid = wait [$timeout]

Like the Perl wait system call, blocks until a child process terminates and returns the PID of the deceased process, or -1 if there are no child processes remaining to reap. The exit status of the child is returned in $?.

This version of the wait call can take an optional $timeout argument, which specifies the maximum length of time in seconds to wait for a process to complete. If a timeout is supplied and no process completes before the timeout expires, then the wait function returns the value -1.5 (you can also test if the return value of the function is the same as Forks::Super::TIMEOUT, which is a constant to indicate that a wait call timed out).

$reaped_pid = waitpid $pid, $flags [, $timeout]

Waits for a child with a particular PID or a child from a particular process group to terminate and returns the PID of the deceased process, or -1 if there is no suitable child process to reap. If the return value contains a PID, then $? is set to the exit status of that process.

A valid job ID (see "Deferred processes") may be used as the $pid argument to this method. If the waitpid call reaps the process associated with the job ID, the return value will be the actual PID of the deceased child.

Note that the waitpid function can wait on a job ID even when the job associated with that ID is still in the job queue, waiting to be started.

A $pid value of -1 waits for the first available child process to terminate and returns its PID.

A $pid value of 0 waits for the first available child from the same process group of the calling process.

A negative $pid that is not recognized as a valid job ID will be interpreted as a process group ID, and the waitpid function will return the PID of the first available child from the same process group.

On some^H^H^H^H every modern system that I know about, a $flags value of POSIX::WNOHANG is supported to perform a non-blocking wait. See the Perl waitpid documentation.

If the optional $timeout argument is provided, the waitall function will block for at most $timeout seconds, and return -1.5 (or Forks::Super::TIMEOUT if a suitable process is not reaped in that time.

$count = waitall [$timeout]

Blocking wait for all child processes, including deferred jobs that have not started at the time of the waitall call. Return value is the number of processes that were waited on.

If the optional $timeout argument is supplied, the function will block for at most $timeout seconds before returning.

Forks::Super::isValidPid( $pid )

Tests whether the return value of a fork call indicates that a background process was successfully created or not. On POSIX systems it is sufficient to check whether $pid is a positive integer, but isValidPid is a more

Forks::Super::pause($delay)

A productive drop-in replacement for the Perl sleep system call (or Time::HiRes::sleep, if available). On systems like Windows that lack a proper method for handling SIGCHLD events, the Forks::Super::pause method will occasionally reap child processes that have completed and attempt to dispatch jobs on the queue.

On other systems, using Forks::Super::pause is less vulnerable than sleep to interruptions from this module (See "BUGS AND LIMITATIONS" below).

$status = Forks::Super::status($pid)

Returns the exit status of a completed child process represented by process ID, job ID, or name attribute. Aside from being a permanent store of the exit status of a job, using this method might be a more reliable indicator of a job's status than checking $? after a wait or waitpid call, because it is possible for this module's SIGCHLD handler to temporarily corrupt the $? value while it is checking for deceased processes.

$line = Forks::Super::read_stdout($pid)
@lines = Forks::Super::read_stdout($pid)
$line = Forks::Super::read_stderr($pid)
@lines = Forks::Super::read_stderr($pid)

For jobs that were started with the child_fh => "out" and child_fh => "err" options enabled, read data from the STDOUT and STDERR filehandles of child processes.

Aside from the more readable syntax, these functions may be preferable to

@lines = < {$Forks::Super::CHILD_STDOUT{$pid}} >;
$line = < {$Forks::Super::CHILD_STDERR{$pid}} >;

because they will automatically handle clearing the EOF condition on the filehandles if the parent is reading on the filehandles faster than the child is writing on them.

Functions work in both scalar and list context. If there is no data to read on the filehandle, but the child process is still active and could put more data on the filehandle, these functions return "" in scalar and list context. If there is no more data on the filehandle and the child process is finished, the functions return undef.

Forks::Super::close_fh($pid)

Closes all open file handles and socket handles for interprocess communication with the specified child process. Most operating systems impose a hard limit on the number of filehandles that can be opened in a process simultaneously, so you should use this function when you are finished communicating with a child process so that you don't run into that limit.

Obtaining job information

$job = Forks::Super::Job::get($pid)

Returns a Forks::Super::Job object associated with process ID or job ID $pid. See Forks::Super::Job for information about the methods and attributes of these objects.

@jobs = Forks::Super::Job::getByName($name)

Returns zero of more Forks::Super::Job objects with the specified job names. A job receives a name if a name parameter was provided in the Forks::Super::fork call.

$state = Forks::Super::state($pid)

Returns the state of the job specified by the given process ID, job ID, or job name. See "state" in Forks::Super::Job.

$status = Forks::Super::status($pid)

Returns the exit status of the job specified by the given process ID, job ID, or job name. See "status" in Forks::Super::Job. This value will be undefined until the job is complete.

$reference = bg_eval { BLOCK }
$reference = bg_eval { BLOCK } { option => value, ... }

Evaluates the specified block of code in a background process. When the parent process dereferences the result, it uses interprocess communication to retrieve the result from the child process, waiting until the child finishes if necessary.

# Example 1: must wait until job finishes before $$result is available
$result = bg_eval { sleep 3 ; return 42 };
print "Result is $$result\n";

# Example 2: $$result is probably available immediately
$result = bg_eval { sleep 3 ; return 42 };
&do_something_that_takes_about_5_seconds();
print "Result is $$result\n";

The code block is always evaluated in scalar context, though it is acceptable to return a reference:

$result = bg_eval {
        @files = File::Find::find(\&criteria, @lots_of_dirs);
        return \@files;
    };
# ... do something else while that job runs ...
foreach my $matching_file (@$$result) { # note double dereference
    # ... do something with $matching_file
}

The background job will be spawned with the Forks::Super::fork call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed to bg_eval that will be provided to the fork call. For example:

$result = bg_eval {
        return get_from_teh_Internet($something, $where);
    } { timeout => 60, priority => 3 };

will return a reference to undef if the operation takes longer than 60 seconds. Most valid options for the fork call are also valid options for bg_eval, including timeouts, delays, job dependencies, names, and callback. The only invalid options for bg_eval are cmd, sub, exec, and child_fh.

A call to bg_eval will set the variables $Forks::Super::LAST_JOB and $Forks::Super::LAST_JOB_ID. See "MODULE VARIABLES" below.

@result = bg_eval { BLOCK }
@result = bg_eval { BLOCK } { option => value, ... }

Evaluates the specified block of code in a background process and in list context. The parent process retrieves the result from the child through interprocess communication the first time that an element of the array is referenced; the parent will wait for the child to finish if necessary.

The background job will be spawned with the Forks::Super::fork call, and the command will block, fail, or defer a background job in accordance with all of the rules of this module. Additional options may be passed to the bg_eval function that will be provided to the Forks::Super::fork call. For example:

@result = bg_eval {
        count_words($a_huge_file)
    } { timeout => 60 };

will return an empty list if the operation takes longer than 60 seconds. Any valid options for the fork call are also valid options for bg_eval, except for exec, cmd, sub, and child_fh.

A call to bg_eval will set the variables $Forks::Super::LAST_JOB and $Forks::Super::LAST_JOB_ID. See "MODULE VARIABLES" below.

$reference = bg_qx $command
$reference = bg_qx $command, { option => value , ... }

Executes the specified shell command in a background process. When the parent process dereferences the result, it uses interprocess communication to retrieve the output from the child process, waiting until the child finishes if necessary. The deferenced value will contain the output from the command.

Think of this command as a background version of Perl's backticks or qx() function.

The background job will be spawned with the Forks::Super::fork call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed to bg_eval that will be provided to the fork call. For example, this command

$result = bg_qx "nslookup joe.schmoe.com", { timeout => 15 };

will run nslookup in a background process for up to 15 seconds. The expression $$result will then contain all of the output produced by the process up until the time it was terminated. Most valid options for the fork call are also valid options for bg_eval, including timeouts, delays, job dependencies, names, and callback. The only invalid options for bg_eval are cmd, sub, exec, and child_fh.

A call to bg_qx will set the variables $Forks::Super::LAST_JOB and $Forks::Super::LAST_JOB_ID. See "MODULE VARIABLES" below.

@result = bg_qx $command
@result = bg_qx $command, { option => value , ... }

Like the scalar context form of the bg_qx command, but loads output of the specified command into an array, one element per line (as defined by the current record separator $/). The command will run in a background process. The first time that an element of the array is accessed, the parent will retrieve the output of the command, waiting until the child finishes if necessary.

Think of this command as a background version of Perl's backticks or qx() function.

The background job will be spawned with the Forks::Super::fork call, and the command will block, fail, or defer a background job in accordance with all of the other rules of this module. Additional options may be passed to bg_eval that will be provided to the fork call. For example, this command

@result = bg_qx "ssh $remotehost who", { timeout => 15 };

will run in a background process for up to 15 seconds. @result will then contain all of the output produced by the process up until the time it was terminated. Most valid options for the fork call are also valid options for bg_eval, including timeouts, delays, job dependencies, names, and callback. The only invalid options for bg_eval are cmd, sub, exec, and child_fh.

A call to bg_qx will set the variables $Forks::Super::LAST_JOB and $Forks::Super::LAST_JOB_ID. See "MODULE VARIABLES" below.

MODULE VARIABLES

Module variables may be initialized on the use Forks::Super line

# set max simultaneous procs to 5, allow children to call CORE::fork()
use Forks::Super MAX_PROC => 5, CHILD_FORK_OK => -1;

or they may be set explicitly in the code:

$Forks::Super::ON_BUSY = 'queue';
$Forks::Super::FH_DIR = "/home/joe/temp-ipc-files";

Module variables that may be of interest include:

$Forks::Super::MAX_PROC

The maximum number of simultaneous background processes that can be spawned by Forks::Super. If a fork call is attempted while there are already at least this many active background processes, the behavior of the fork call will be determined by the value in $Forks::Super::ON_BUSY or by the on_busy option passed to the fork call.

This value will be ignored during a fork call if the force option is passed to fork with a non-zero value. The value might also not be respected if the user supplies a code reference in the can_launch option and the user-supplied code does not test whether there are already too many active proceeses.

$Forks::Super::ON_BUSY = 'block' | 'fail' | 'queue'

Determines behavior of a fork call when the system is too busy to create another background process.

If this value is set to block, then fork will wait until the system is no longer too busy and then launch the background process. The return value will be a normal process ID value (assuming there was no system error in creating a new process).

If the value is set to fail, the fork call will return immediately without launching the background process. The return value will be -1. A Forks::Super::Job object will not be created.

If the value is set to queue, then the fork call will create a "deferred" job that will be queued and run at a later time. Also see the queue_priority option to fork to set the urgency level of a job in case it is deferred. The return value will be a large and negative job ID.

This value will be ignored in favor of an on_busy option supplied to the fork call.

$Forks::Super::CHILD_FORK_OK = -1 | 0 | +1

Spawning a child process from another child process with this module has its pitfalls, and this capability is disabled by default: you will get a warning message and the fork() call will fail if you try it.

To override hits behavior, set $Forks::Super::CHILD_FORK_OK to a non-zero value. Setting it to a positive value will allow you to use all the functionality of this module from a child process (with the obvious caveat that you cannot wait on the child process of a child process from the main process).

Setting $Forks::Super::CHILD_FORK_OK to a negative value will disable the functionality of this module but will reenable the classic Perl fork() system call from child processes.

$Forks::Super::DEBUG, Forks::Super::DEBUG

To see the internal workings of the Forks::Super module, set $Forks::Super::DEBUG to a non-zero value. Information messages will be written to the Forks::Super::Debug::DEBUG_fh filehandle. By default Forks::Super::Debug::DEBUG_fh is aliased to STDERR, but it may be reset by the module user at any time.

Debugging behavior may be overridden for specific jobs if the debug or undebug option is provided to fork.

%Forks::Super::CHILD_STDIN
%Forks::Super::CHILD_STDOUT
%Forks::Super::CHILD_STDERR

In jobs that request access to the child process filehandles, these hash arrays contain filehandles to the standard input and output streams of the child. The filehandles for particular jobs may be looked up in these tables by process ID or job ID for jobs that were deferred.

Remember that from the perspective of the parent process, $Forks::Super::CHILD_STDIN{$pid} is an output filehandle (what you print to this filehandle can be read in the child's STDIN), and $Forks::Super::CHILD_STDOUT{$pid} and $Forks::Super::CHILD_STDERR{$pid} are input filehandles (for reading what the child wrote to STDOUT and STDERR).

As with any asynchronous communication scheme, you should be aware of how to clear the EOF condition on filehandles that are being simultaneously written to and read from by different processes. A scheme like this works on most systems:

# in parent, reading STDOUT of a child
for (;;) {
    while (<{$Forks::Super::CHILD_STDOUT{$pid}}>) {
      print "Child $pid said: $_";
    }

    # EOF reached, but child may write more to filehandle later.
    sleep 1;
    seek $Forks::Super::CHILD_STDOUT{$pid}, 0, 1;
}
@Forks::Super::ALL_JOBS
%Forks::Super::ALL_JOBS

List of all Forks::Super::Job objects that were created from fork() calls, including deferred and failed jobs. Both process IDs and job IDs for jobs that were deferred at one time) can be used to look up Job objects in the %Forks::Super::ALL_JOBS table.

$Forks::Super::QUEUE_INTERRUPT

On systems with mostly-working signal frameworks, this module installs a signal handler the first time that a task is deferred. The signal that is trapped is defined in the variable $Forks::Super::QUEUE_INTERRUPT. The default value is USR1, and it may be overridden directly or set on module import

use Forks::Super QUEUE_INTERRUPT => 'TERM';
$Forks::Super::QUEUE_INTERRUPT = 'USR2';

You would only worry about resetting this variable if you (including other modules that you import) are making use of an existing SIGUSR1 handler.

Forks::Super::TIMEOUT

A possible return value from wait and waitpid functions when a timeout argument is supplied. The value indicating a timeout should not collide with any other possible value from those functions, and should be recognizable as not an actual process ID.

$Forks::Super::LAST_JOB_ID
$Forks::Super::LAST_JOB

Calls to the bg_eval and bg_qx functions launch a background process and set the variables $Forks::Super::LAST_JOB_ID to the job's process ID and $Forks::Super::LAST_JOB to the job's Forks::Super::Job object. These functions do not explicitly return the job id, so these variables provide a convenient way to query that state of the jobs launched by these functions.

Some bash users will immediately recognize the parallels between these variables and the bash $! variable, which captures the process id of the last job to be run in the background.

DIAGNOSTICS

fork() not allowed in child process ...
Forks::Super::fork() call not allowed in child process ...

When the package variable $Forks::Super::CHILD_FORK_OK is zero, this package does not allow the fork() method to be called from a child process. Set $Forks::Super::CHILD_FORK_OK to change this behavior.

quick timeout

A job was configured with a timeout/expiration time such that the deadline for the job occurred before the job was even launched. The job was killed immediately after it was spawned.

Job start/Job dependency <nnn> for job <nnn> is invalid. Ignoring.

A process id or job id that was specified as a depend_on or depend_start option did not correspond to a known job.

Job <nnn> reaped before parent initialization.

A child process finished quickly and was reaped by the parent process SIGCHLD handler before the parent process could even finish initializing the job state. The state of the job in the parent process might be unavailable or corrupt for a short time, but eventually it should be all right.

interprocess filehandles not available
could not open filehandle to provide child STDIN/STDOUT/STDERR
child was not able to detect STDIN file ... Child may not have any input to read.
could not open filehandle to write child STDIN
could not open filehandle to read child STDOUT/STDERR

Initialization of filehandles for a child process failed. The child process will continue, but it will be unable to receive input from the parent through the $Forks::Super::CHILD_STDIN{pid} filehandle, or pass output to the parent through the filehandles $Forks::Super::CHILD_STDOUT{PID} AND $Forks::Super::CHILD_STDERR{pid}.

exec option used, timeout option ignored

A fork call was made using the incompatible options exec and timeout.

INCOMPATIBILITIES

This module requires its own SIGCHLD handler, and is incompatible with any module that tries to install another SIGCHLD handler. In particular, if you are used to setting

$SIG{CHLD} = 'IGNORE'

in your code, cut it out.

Some features use the alarm function and custom SIGALRM handlers in the child processes. Using other modules that employ this functionality may cause undefined behavior. Systems and versions that do not implement the alarm function (like MSWin32 prior to Perl v5.7) will not be able to use these features.

The first time that a task is deferred, by default this module will try to install a SIGUSR1 handler. See the description of $Forks::Super::QUEUE_INTERRUPT under "MODULE VARIABLES" for changing this behavior if you intended to use a SIGUSR1 handler for something else.

DEPENDENCIES

The bg_eval function requires either YAML or JSON. If neither module is available, then using bg_eval will result in the script croak'ing.

Otherwise, there are no hard dependencies on non-core modules. Some features, especially operating-system specific functions, depend on some modules (Win32::API and Win32::Process for Wintel systems, for example), but the module will compile without those modules. Attempts to use these features without the required modules will be silently ignored.

BUGS AND LIMITATIONS

Leftover temporary files and directories

In programs that use the interprocess communication features, the module does not always do a good job of cleaning up after itself. You may find directories called .fhfork<nnn> that may or not be empty scattered around your filesystem.

Interrupted system calls

A typical script using this module will have a lot of behind-the-scenes signal handling as child processes finish and are reaped. These frequent interruptions can affect the execution of your program. For example, in this script:

1: use Forks::Super;
2: fork(sub => sub { sleep 2 });
3: sleep 5;
4: # ... program continues ...

the sleep call in line 3 is probably going to get interrupted before 5 seconds have elapsed as the end of the child process spawned in line 2 will interrupt execution and invoke the SIGCHLD handler. In some cases there are tedious workarounds:

3a: $stop_sleeping_at = time + 5;
3b: sleep 1 while time < $stop_sleeping_at;

It should be noted that signal handling in Perl is much improved with version 5.7.3, and the problems caused by such interruptions are much more tractable than they used to be.

The pause call itself has the limitation that it may sleep for longer than the desired time. This is because the "productive" code executed in a pause function call can take an arbitrarily long time to run.

Idiosyncratic behavior on some systems

The system implementation of fork'ing and wait'ing varies from platform to platform. This module has been extensively tested on Cygwin, Windows, and Linux, but less so on other systems. It is possible that some features will not work as advertised. Please report any problems you encounter to <mob@cpan.org> and I'll see what I can do about it.

SEE ALSO

There are reams of other modules on CPAN for managing background processes. See Parallel::*, Proc::Parallel, Proc::Fork, Proc::Launcher. Also Win32::Job.

Inspiration for bg_eval function from Acme::Fork::Lazy.

AUTHOR

Marty O'Brien, <mob@cpan.org>

LICENSE AND COPYRIGHT

Copyright (c) 2009-2010, Marty O'Brien.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.