NAME

Parallel::ForkControl - Finer grained control of processes on a Unix System

SYNOPSIS

  use Parallel::ForkControl;
  my $forker = new Parallel::ForkControl(
				WatchCount		=> 1,
				MaxKids			=> 50,
				MinKids			=> 5,
				WatchLoad		=> 1,
				MaxLoad			=> 8.00,
				Name			=> 'My Forker',
				Code			=> \&mysub
	);
  my @hosts = qw/host1 host2 host3 host5 host5/;

  my $altSub = sub { my $t = shift; ... };

  foreach my $host (@hosts) {
	if( $host eq 'alternateHost' ) {
		$forker->run( $altSub, $host );
	}
	else {
		$forker->run($host);
	}
  }

  $forker->waitforkids();  # wait for all children to finish;
  
  my $results = $forker->get_results();  # Get the Return Codes from Children
	# $results = {
	# 		'29786' => {	# Kid PID
	#				'status' => 'string',
	#				'exitcode' => int,
	#				'return' => $scalarCopyofReturnValue,
	#				'signature' => $scalarFreezeOfArguments,
	#		}, ...
  $forker->clear_results();              # Reset the Results Tracker
  .....

DESCRIPTION

Parallel::ForkControl introduces a new and simple way to deal with fork()ing. The 'Code' parameter will be run everytime the run() method is called on the fork object. Any parameters passed to the run() method will be passed to the subroutine ref defined as the 'Code' arg. This allows a developer to spend less time worrying about the underlying fork() system, and just write code.

METHODS

new([ Option => Value ... ])

Constructor. Creates a Parallel::ForkControl object for using. Ideally, all options should be set here and not changed, though the accessors and mutators allow such behavior, even while the run() method is being executed.

Options
Name

Process Name that will show up in a 'ps', mostly cosmetic, but serves as an easy way to distinguish children and parent in a ps.

ProcessTimeOut

The max time any given process is allowed to run before its interrupted. Default :120 seconds

WatchCount

Enforce count (MaxKids) restraints on new processes. Default : 1

WatchLoad

Enforce load based (MaxLoad) restraints on process creation. NOTE: This MUST be a true value to enable throttling based on Load Averages. Default : 0

WatchMem ***

(unimplemented)

WatchCPU ***

(unimplemented)

Method

May be 'block' or 'cycle'. Block will fork off MaxKids and wait for all of them to die, then fork off MaxKids more processes. Cycle will continually replace processes as the restraints allow. Cycle is almost ALWAYS the preferred method. Default :Cycle B

MaxKids

The maximum number of children that may be running at any given time. Default : 5

MinKids

The minimum number of kids to keep running regardless of load/memory/CPU throttling. Default : 1

MaxLoad

The maximum one minute average load. Make sure to set WatchLoad. Default : 4.50 (off by default)

MaxMem ***

(unimplemented)

MaxCPU ***

(unimplemented)

Code

This should be a subroutine reference. If you intend on passing arguments to this subroutine arguments it is imperative that you NOT include () in the reference. All code inside the subroutine will be run in the child process. The module provides all the necessary checks and safety nets, so your subroutine may just "return". It is not necessary, nor is it good practice to have exit()s in this subroutine as eventually, return codes are stored and made available to the parent process after completion. Examples:

my $code = sub {
		# do something useful
		my $t = shift;
		return $t;
};

my $forker = new Parallel::ForkControl(
			Name => 'me',
			MaxKids => 10,
			Code => $code
			# or
			#Code => \&mysub
)

sub mysub {
	my $t = shift;
	return $t;
}

Alternatively, you may pass the sub reference as the first argument of the run() method.

Accounting

By default this is turned off. If you would like to keep track of the exit codes, sub routine return values, and current status of the children forked by the run() routine, enable this option:

Accounting	=> 1
TrackArgs

By setting this to a true value, the fork controller will keep track of the arguments passed to each of the children. Using this you can see what arguments yielded which results. This argument truly only makes sense if you've enabled the Accounting option.

Check_At

This determines between how many child processes the module does some checking to verify the validity of its internal process table. It shouldn't be necessary to modify this value, but given it is a little low, someone only utilizing this module for a larger number of data sets might want to check things at larger intervals. Default : 2

Debug

A number 0-4. The higher the number, the more debugging information you'll see. 0 means nothing. Default : 0

run([ @ARGS ])

This method calls the subroutine passed as the Code option. This method handles process throttling, creation, monitoring, and reaping. The subroutine in the Code option run in the child process and all control is returned to the parent object as soon as the child is successfully created. run() will block until it is allowed to create a process or process creation fails completely. run() returns the PID of the child on success, or undef on failure. NOTE: This is not the return code of your subroutine. I will eventually provide mapping to argument sets passed to run() with success/failure options and (idea) a "Report" option to enable some form of reporting based on that API.

waitforkids()

This method blocks until all children have finished processing.

cleanup()

Alias for waitforkids(), provided for legacy applications

get_results( [ $pid ])

This method returns a hash reference of the arguments and return codes of the children:

$hashref = {
	'2975' =>  {	# PID of Child
		exitcode => 0,
		status => 'done',
		signature => $FrozenScalar,
		return => $ReferenceToReturnValue
	},
	....
};

The $pid is optional, but if specified, will return:

$hashref = {
	exitcode => 0,
	status => 'done',
	signature => $FrozenScalar,
	return => $ReferenceToReturnValue
};

Requires Accounting => 1 and optionally TrackArgs => 1

clear_results()

This method clears the results hash.

kids()

This method returns the PIDs of all the children still alive in array context. In scalar context it returns the number of children still running.

kid_time( $PID )

This method returns the start time in epoch seconds that the PID began.

EXPORT

None by default.

KNOWN ISSUES

01/08/2004 - brad@divisionbyzero.net

For some reason, I'm having to throttle process creation, as a slew of processes starting and ending at the same time seems to be causing problems on my machine. I've adjust the Check_At down to 2 which seems to catch any processes whose SIG{CHLD} gets lost in the mess of spawning. I'm looking into a more permanent, professional solution.

SEE ALSO

perldoc -f fork, search CPAN for Parallel::ForkManager

AUTHOR

Brad Lhotsky <brad@divisionbyzero.net>

CONTRIBUTIONS BY

Mark Thomas <mark@ackers.net>

COPYRIGHT AND LICENSE

Copyright 2003 by Brad Lhotsky

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.