NAME
Parallel::ForkManager::Scaled - Run processes in parallel based on CPU usage
VERSION
Version 0.08
SYNOPSIS
use Parallel::ForkManager::Scaled;
# my $pm = Parallel::ForkManager::Scaled->new( attrib => value, ... );
my $pm = Parallel::ForkManager::Scaled->new;
# Used just like Parallel::ForkManager, so I'll paraphrase its documentation
for my $data (@all_data) {
# $pid is set to the child process' PID
my $pid = $pm->start and next;
# In the child process now
# do some work ..
# Exit the child
$pm->finish;
}
DESCRIPTION
This module inherits from Parallel::ForkManager and adds the ability to automatically manage the number of processes running based on how busy the system is by watching the CPU idle time. Each time a child is about to be start()ed a new value for max_procs may be calculated (if enough time has passed since the last calculation). If a new value is calculated, the number of processes to run will be adjusted by calling set_max_procs with the new value.
Without specifying any attributes to the constructor, some defaults will be set for you (see Attributes below)
Attributes
Attributes are just methods that may be passed to the constructor (new()
) and most may be changed during the life of the returned object. They take as a parameter a new value to set for the attribute and return the current value (or new value if one was passed).
- hard_min_procs
-
The number of running processes will never be adjusted lower than this value.
default: 1
- hard_max_procs
-
The number of running processes will never be adjusted higher than this value.
default: The detected number of CPUs * 2
- soft_min_procs
- soft_max_procs
-
This is initially set to hard_min_procs and hard_max_procs respectively and is adjusted over time. These are used when calculating adjustments as the minimum and maximum number of processes respectively.
Over time soft_min_procs and soft_max_procs should approach the same value for a consistent workload and a machine not otherwise busy.
Depending on the needs of the system, these values may also diverge if necessary to try to reach idle_target.
You may adjust these values if you wish by passing a value to the method but you probably shouldn't. :)
- initial_procs (read-only)
-
The number of processes to start running before attempting any adjustments, max_procs will be set to this value upon initialization.
default: half way between hard_min_procs and hard_max_procs
- update_frequency
-
The minimum amount of time, in seconds, that must elapse between checks of the system CPU's idle % and updates to the number of running processes.
Set this to 0 to cause a check before each call to
start()
.Before each call to
start()
the time is compared with the last time a check/update was performed. If this much time has passed, a new check will be made of how busy the CPU is and the number of processes may be adjusted.default: 1
- idle_target
-
Percentage of CPU idle time to try to maintain by adjusting the number of running processes between hard_min_procs and hard_max_procs
default: 0 # try to keep the CPU 100% busy (0% idle)
- idle_threshold
-
Only make adjustments if the current CPU idle % is this distance away from idle_target. In other words, only adjust if
abs(cur_idle - idle_target) > idle_threshold
. This may be a fractional value (floating point).You may notce that the default idle_target of 0 and idle_threshold of 1 would seem to indicate that the processes would never be adjusted as idle can never be less than 0%. At the limits, the threshold is adjusted so that we will still attempt adjustments, something like this:
min_ok = max(0, idle_target - idle_threshold) max_ok = min(100, idle_target - idle_threshold) adjust if idle >= max_ok adjust if idle <= min_ok
default: 1
- run_on_update
-
This is a callback function that is run immediately after (possibly) calculating an adjustment, but before setting our max_procs to the new value. This allows you to override the default behavior of this module for your own nefarious purposes.
run_on_update expects a coderef which will be called with two parameters:
The object being adjusted.
The newly calculated value for max_procs or undef if there was no adjustment to be made.
The callback must return either a new value for max_procs or undef. If the returned value is undef, no change will be made to max_procs. Otherwise if a value is returned it will be used to set max_procs.
Be aware that your returned value will be constrained by soft_min_procs and soft_max_procs.
- tempdir
-
This is passed to the Parallel::ForkManager constructor to set tempdir. Where Parallel::ForkManager is constructed thusly:
my $pm = Parallel::ForkManager->new($procs, $tempdir);
The equivalent for this module would be:
my $pm = Parallel::ForkManager::Scaled->new(initial_procs => $procs, tempdir => $tempdir);
Methods
All methods inherited from Parallel::ForkManager plus the following:
- last_update
-
Returns the last
time()
a check/update was performed. - idle
-
Returns the system's idle percentage as of last_update.
- ncpus
-
The number of CPUs detected on the system, this is just a wrapper to the cpus function from Unix::Statgrab.
- stats
-
Returns a formatted string with information about the current status. Takes a single parameter, the new value for max_procs to be set. If no parameter is passed, the vlaue max_procs will be used.
Methods you probably don't need to use
These are not meant for general consumption but are available anyway. Probably best to avoid them :)
- update_stats_pct
-
This method will force an update of the idle statistic.
- dump_stats
-
Print the string returned by stats to STDERR. This may be used in the run_on_update callback to see diagnostics as processes are run:
$pm->run_on_update(\&Parallel::ForkManager::Scaled::dump_stats)
EXAMPLES
These examples are also provided in the examples/ directory of this distribution.
Maximize CPU usage
see: examples/prun.pl
Run shell commands that are passed into the program and try to keep the CPU busy, i.e. 0% idle
use Parallel::ForkManager::Scaled;
my $pm = Parallel::ForkManager::Scaled->new(
run_on_update => \&Parallel::ForkManager::Scaled::dump_stats
);
# just to be sure we can saturate the CPU
$pm->hard_max_procs($pm->ncpus * 4);
$pm->set_waitpid_blocking_sleep(0);
while (<>) {
chomp;
$pm->start and next;
# In the child now, run the shell process
system $_;
$pm->finish;
}
Dummy Load
see: examples/dummy_load.pl
This example provides a way to test the capabilities of this module. Try changing the idle_target and other settings to see the effect.
use Parallel::ForkManager::Scaled;
my $pm = Parallel::ForkManager::Scaled->new(
run_on_update => \&Parallel::ForkManager::Scaled::dump_stats,
idle_target => 50,
);
$pm->set_waitpid_blocking_sleep(0);
for my $i (0..1000) {
$pm->start and next;
my $start = time;
srand($$);
my $lifespan = 5+int(rand(10));
# Keep the CPU busy until it's time to exit
while (time - $start < $lifespan) {
my $a = time;
my $b = $a^time/3;
}
$pm->finish;
}
NOTES
Currently this module only works on systems where Unix::Statgrab is available, which is probably any system where the libstatgrab library can compile.
AUTHOR
Jason McCarver <slam@parasite.cc>
SEE ALSO
COPYRIGHT AND LICENSE
This software is copyright (c) 2016 by Jason McCarver
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.