NAME

Helios::Worker - base class for workers in the Helios job processing system

DESCRIPTION

Helios::Worker is the base class for all worker classes intended to be run by the Helios parallel job processing system. It encapsulates functions of the underlying TheSchwartz job queue system and provides additional methods to handle configuration, job argument parsing, logging, and other functions.

A Helios::Worker subclass must implement the work() method. It may also implement the max_retries() and retry_delay() methods if failed jobs of your class should be retried. The other Helios::Worker methods are available for use during job processing.

When job processing is completed, the job should be marked as completed successfully or failed by using the completedJob(), failedJob(), and failedJobPermanent() methods. If your worker class supports OVERDRIVE mode operation, it should also call the shouldExitOverdrive() method at the end of your work() method and explicitly exit() if shouldExitOverdriver() returns a true value. (If supporting OVERDRIVE, your worker class should also explicitly exit() if job processing fails to prevent errors from affecting future jobs.)

TheSchwartz METHODS

These methods are used by the underlying TheSchwartz job queuing system to determine what work is to be performed and, a job fails, how it should be retried.

max_retries()

Controls how many times a job will be retried. Not used in the base Helios::Worker class.

retry_delay()

Controls how long (in secs) before a failed job will be retried. Not used in the base Helios::Worker class.

If you want your worker class to retry a failed job twice with an hour interval between each try, you would define max_retries() and retry_delay thusly:

sub max_retries { return 2; }
sub retry_delay { return 3600; }

work()

The work() method is essentially the main() function for your worker class. When a worker process is launched by the helios.pl worker daemon, the process calls its work() method and passes it the job information for an available job.

A work() method should take the general form:

 sub work {
     my $class = shift;
	 my $job = shift;
	 my $self = $class->new();

	 try {
		$self->prep();
		my $params = $self->getParams();
		my $job_args = $self->getJobArgs($job);

		# DO WORK HERE #

		$self->completedJob($job);
     } catch Helios::Error::Warning {
		 my $e = shift;
		 $self->logMsg($job, LOG_WARN, $e->text);
         $self->completedJob($job, "Job completed with warning ".$e->text);
	 } catch Helios::Error::Fatal {
		 my $e = shift;
		 $self->logMsg($job, LOG_ERR, $e->text);
		 $self->failedJob($job, "Job failed ".$e->text);
     } otherwise {
		 my $e = shift;
		 $self->logMsg($job, LOG_ERR, $e->text);
		 $self->failedJob($job, "Job failed with unknown error ".$e->text);
	 };
 }

NOTE: It should be noted that although work() is called as a class method, the Helios::Worker methods expect the worker to be instantiated as an object. Instantiating the object from the class passed in (as on the 3rd line above) should take care of this discrepancy.

You can use the above as a template for all of your worker classes' work() methods, as they will all need to perform the same basic steps: get the class and job; instantiate the class; call prep(), getParams(), and getJobArgs() to set up the worker object, retrieve configuration parameters, and parse job arguments; do whatever work is to be done, marking the job as completed if successful; logging the error and marking the job as failed if an error occurs.

To support OVERDRIVE mode, a few additions need to be made:

 sub work {
     my $class = shift;
	 my $job = shift;
	 my $self = $class->new();

	 try {
		$self->prep();
		my $params = $self->getParams();
		my $job_args = $self->getJobArgs($job);

		# DO WORK HERE #

		$self->completedJob($job);
     } catch Helios::Error::Warning {
		 my $e = shift;
		 $self->logMsg($job, LOG_WARN, $e->text);
         $self->completedJob($job, "Job completed with warning ".$e->text);
	 } catch Helios::Error::Fatal {
		 my $e = shift;
		 $self->logMsg($job, LOG_ERR, $e->text);
		 $self->failedJob($job, "Job failed ".$e->text);
         exit(1);
     } otherwise {
		 my $e = shift;
		 $self->logMsg($job, LOG_ERR, $e->text);
		 $self->failedJob($job, "Job failed with unknown error ".$e->text);
         exit(1);
	 };

     if ($self->shouldExitOverdrive()) {
		 $self->logMsg(LOG_INFO, "$class worker exited overdrive");
         exit(0);
     }
 }

Note the strategic additions of exit() calls and the shouldExitOverdrive() check at the end of the method. That should help prevent problems with one job spilling over into the next, and also allow the helios.pl daemons to better control the worker child processes.

ACCESSOR METHODS

These accessors will be needed by all subclasses of Helios::Worker.

get/setClient()
get/setJobType()
get/setParams()
get/setIniFile()
get/setHostname()
errstr()
debug()

Most of these are handled behind the scenes simply by calling the prep() method.

After calling prep(), calling getParams() will return a hashref of all the configuration parameters relevant to this worker class on this host.

If debug mode is enabled (the HELIOS_DEBUG env var is set to 1), debug() will return a true value, otherwise, it will be false. Some of the Helios::Worker methods will honor this value and log extra debugging messages either to the console or the Helios log (helios_log_tb table). You can also use it within your own worker classes to enable/disable debugging messages or behaviors.

CONSTRUCTOR

new()

The new() method doesn't really do much except create an object of the appropriate class. (It can overridden, of course.)

OTHER METHODS

prep()

The prep() method is designed to call all the various setup routines needed to get the worker ready to do useful work. It:

  • Pulls in the contents of the HELIOS_DEBUG and HELIOS_INI env vars, and sets the appropriate instance variables if necessary.

  • Calls the getParamsFromIni() method to read the appropriate configuration parameters from the INI file.

  • Calls the getParamsFromDb() method to read the appropriate configuration parameters from the Helios database.

Normally it returns a true value if successful, but if one of the getParamsFrom*() methods throws an exception, that exception will be raised to your calling routine.

getParamsFromIni([$inifile])

The getParamsFromIni() method opens the helios.ini file, grabs global params and params relevant to the current worker class, and returns them in a hash to the calling routine. It also sets the class's internal {params}, so the config parameters are available via the getParams() method.

Typically worker classes will call this once near the start of processing to pick up any relevant parameters from the helios.ini file. However, calling the prep() method takes care of this for you, and is the preferred method.

getParamsFromDb()

The getParamsFromDb() method connects to the Helios database, retrieves params relevant to the current worker class, and returns them in a hash to the calling routine. It also sets the class's internal {params}, so the config parameters are available via the getParams() method.

Typically worker classes will call this once near the start of processing to pick up any relevant parameters from the helios.ini file. However, calling the prep() method takes care of this for you.

There's an important subtle difference between getParamsFromIni() and getParamsFromDb(): getParamsFromIni() erases any previously set parameters from the class's internal {params} hash, while getParamsFromDb() merely updates it. This is due to the way helios.pl uses the methods: the INI file is only read once, while the database is repeatedly checked for configuration updates. For individual worker classes, the best thing to do is just call the prep() method; it will take care of things for the most part.

dbConnect($dsn, $user, $password)

Method to connect to a database. If parameters not specified, uses dsn, user, password from %params hash (the Helios database).

This method uses the DBI->connect_cached() method to attempt to reduce the number of open connections to a particular database.

jobsWaiting()

Scans the job queue for jobs that are ready to run. Returns the number of jobs waiting. Only meant for use with the helios.pl program.

logMsg([$job,] [$priority,] $msg)

Record a message in the log. Though originally only meant to log to a syslogd facility (via Sys::Syslog), it now also logs the message to the Helios database.

In addition to the log message, there are two optional parameters:

$job

The current job being processed. If specified, the jobid will be logged in the database along with the message.

$priority

The priority of the message as defined by syslog. These are really integers, but if you import the syslog constants [use Sys::Syslog qw(:macros)] into your namespace, your logMsg() calls will be much more readable. Refer to the "CONSTANTS" in Sys::Syslog manpage for a list of valid syslog constants. LOG_DEBUG, LOG_INFO, LOG_WARNING, and LOG_ERR are the most commonly used with Helios; LOG_INFO is the default.

In addition, there are two INI file options used to configure logging. These will be passed to syslogd when logMsg() calls Sys::Syslog::openlog():

log_facility

The syslog facility to log the message to.

log_options

Any logging options to specify to syslogd. Again, see the Sys::Syslog manpage.

For database logging, the host, process id, and worker class are automatically recorded in the database with your log message. If you supplied either a TheSchwartz::Job object or a priority, the jobid and/or priority will also be recorded with your message.

parseArgXML($xml)

Given a string of XML, parse it into a mixed hash/arrayref structure. This uses XML::Simple.

getJobArgs($job)

Call getJobArgs() to pick the Helios job arguments (the first element of the job->args() array) from the Schwartz job object, parse the XML into a Perl data structure (via XML::Simple) and return the structure to the calling routine.

This is really a convenience method created because

$args = $self->parseArgXML( $job->arg()->[0] );

looks nastier than it really needs to be.

shouldExitOverdrive()

Determine whether or not to exit if OVERDRIVE mode is enabled. The config params will be checked for HOLD, HALT, or OVERDRIVE values. If HALT is defined, HOLD == 1, or OVERDRIVE == 0, this method returns a true value, indicating the worker should exit().

JOB CONTROL METHODS

TheSchwartz::Job methods do not provide adequate logging of job completion for our purposes, so these methods encapsulate extra 'Helios' things we want to do in addition to the normal 'TheSchwartz' things.

completedJob($job)

Marks $job as completed successfully.

failedJob($job [, $error])

Marks $job as failed. Allows job to be retried if your worker class supports that (see max_retries()).

failedJobPermanent($job [, $error])

Marks $job as permanently failed (no more retries allowed).

SEE ALSO

TheSchwartz, XML::Simple, Config::IniFiles

AUTHOR

Andrew Johnson, <ajohnson@ittoolbox.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by CEB Toolbox, Inc.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.

WARRANTY

This software comes with no warranty of any kind.