NAME

HPCI::Group

SYNOPSIS

Role for building a cluster-specific driver for a group of stages. This should only be used internally to the HPCI module - code that uses this driver will not load this module (or the driver module) explicitly.

It describes the user interface for a generic group, hiding (as much as possible) the specifics of the actual cluster that is being used. The driver module that consumes this role will arrange to translate the generic interface into the particular interface conventions of the specific cluster that it accesses.

An (internally defined) cluster-specific group object is defined with:

package HPCD::$cluster::Group;
use Moose;

### required method definitions

with 'HPCI::Group' => { StageClass => 'HPCD::$cluster::Stage' },
    # any other roles required ...
    ;

### cluster-specific method definition if any ...

DESCRIPTION

This role provides the generic interface for a group object which can configure and run a collection of stages (jobs) on machines in a cluster. It is written to be independent of the specifics of any particular cluster interface. The cluster-specific module that consumes this role is not accessed directly by the user program - they are provided with a group driver object of the appropriate cluster-specific type using the "class method" HPCI->group (with an appropriate cluster argument) to request an appropriate to build it.

ATTRIBUTES

cluster

The type of cluster that will be used to execute the group of stages. This value is passed on by the HPCI->group method when it creates a new group. Since it also uses that value to select the type of group object that is created, it is somewhat redundant.

name (optional)

The name of this group of stages. Defaults to 'default_group_name'.

storage_classes (optional)

HPCI has two conceptual storage types that it expects to be available.

Long-term storage is storage that is reliably preserved over time. The data in long-term storage is accessible to all nodes on the cluster, but possibly only accessible through special commands.

Working storage is storage that is directly-accessible to a node through normal file system access methods. It can be a private disk that is not accessible to other nodes, or it can be a shared file system that is available to other nodes.

It is fairly common, and most convenient, if the working storage also qualifies as long-term storage. That is the default expectation if HPCI is not told otherwise.

However, some types of cluster can have their nodes rebuilt at intervals, losing the data on their local disks. Some types of cluster have a shared file system that is not strongly persistent, but which can be rebuilt at intervals. Some types of cluster have shared file systems that have size limitations that mean that some or all of the data sets for stage processes cannot be stored there.

In such cases, some or all of the files must have a long-term storage location that is different from the more convenient working storage location that will be used when stages are running. Depending upon the environment, the long-term storage will use something like:

network accessed storage

A storage facility that allows file to be uploeded and downloaded through a netowrk connection (from any node in the cluster).

parent managed storage

The job controlling program may have long-term storage of it own that is not accessible to other nodes in the cluster. If there is a large enough shared file system (that for some reason cannot be used as long-term storage) the parent HPCI program can copy files between that storage and the shared storage as needed to make the files available and to preserve the results.

bundled storage

In a cloud layout there is often no file system shared amongst all of the nodes and the parent process. In this type of cluster, a submitted stage will include some sort of bundle containing a collection of data and control files (possibly even the entire operating system) to be used by the stage, and a similar sort of bundle is recovered to provide the results of running a stage. (This could be, for example, a docker image.)

The attribute storage_classes defines the available storage classes that can be used for stage files.

In most cases, all files for all stages will be of the same storage class, but some cluster configurations will have multiple storage choices and can have, for some stages, the need to use more than one of the storage classes for different files within the same job.

To cater to both of these, the storage_classes attribute can either be a single arrayref which will be used for all files, or it can be a hash of named arrayrefs, with the name being used to select the class for each individual file. A file (described in the files attribute of the stage class) can either be a scalar string specifying the pathname of the working storage location that will be used, or it can be a one element hashref, with the key selecting the storage class and the value providing the working storage pathname. If the hash of named arrayrefs is used, one of the elements of the hash should have the key default - that will be used for files which do not provide an explicit storage class key.

The default value for this attribute is:

[ 'HPCI::File ]

The HPCI::File class defines the usage for the common case in which there is no need for a long-term storage area that is different from the working storage area.

Classes that provide to separate long-term storage area will usually require additional arguments, for specifying access control information (such as url, username, password) and how to map the working storage pathname into the corresponding location in the long-term storage.

See the documentation for HPCI::File for details on writing new classes.

storage_class (internal)

The name of key to use to select a storage class from the storage_classes attribute for files that do no have an explicit class given.

Defaults to 'default'.

_unique_name (internal)

The name of this group of stages. Used as the default value for the group_dir attribute.

base_dir (optional)

The directory that will contain all generated output (unless that output is specifically directed to some other location). The default is the current directory.

group_dir (optional)

The directory which will contain all output pertaining to the entire group. By default, this is a new directory under base_dir which is given a name combining the name of the group and the timestamp when the group was created (e.g. EXAMPLEGROUP-YYMMDD-HHMMSS).

connect (optional)

This can contain an URL to be used by the driver for types of cluster where it is necessary to connect to the cluster in some way. It can be omitted for local clusters that are directly accessible.

login, password (optional)

This can contain an identifier to be used by the driver for types of cluster which require authorization.

max_concurrent (optional)

The maximum number of stages to be running concurrently. If 0 (which is the default), then there is no limit applied directly by HPCI (although the underlying cluster-specific driver might apply limits of its own).

stage_defaults

This attribute can be given a hash reference containing values that will be passed to every stage created.

status (provided internally)

After the execute method has been called, this attribute contains the return result from the execution. This is a hash (indexed by stage name). The value for each stage is an array of the return status. (Usually, this array has only one element, but there will be more if the stage was retried. The final element of the array is almost always the one that you wish to look at.) The return status is a hash - it will always contain an element key 'exit_status' giving the exit status of the stage. Additional entries will be found in the hash for cluster-specific return reults. Thus, to check the exit status of a particular stage you would code either:

$result = $group->execute;
if ($result->{SOMESTAGENAME}[-1]{exit_status}) {
    die "SOMESTAGENAME failed!";
}

or:

$group->execute;
# ...
if ($group->status->{SOMESTAGENAME}[-1]{exit_status}) {
    die "SOMESTAGENAME failed!";
}

file_system_delay

Shared files systems can have a delay period during which an action on the file system is not yet visible to another node sharing that file system. This is common for NFS shared file systems, for example.

The file_system_delay attribute can be given a non-zero number of seconds to indicate the amount of time to wait to ensure that actions taken on another node are visible.

This is used for internal actions such as validating that required stage output files have been created.

METHODS

$group->stage( name=>'stagename', ... )

Creates a stage and adds it to the group. See HPCI::Stage for the generic parameters you may provide for a stage; and see HPCD::$cluster::Stage for the cluster-specific parameters for the actual type of cluster you are using.

Note: this is the only way to add a stage object to the group. In particular, you cannot create a stage object separately and add it to the group - this is done to ensure that the created stage object is consistant with the actual group object and that you don't have to change code in multiple places if you switch to using a different cluster type for the group. (If you want to mix stages for multiple cluster types within your program, you should either create two groups that execute independently, or else create a stage that itself creates a group and manages the stages for the second type of cluster.)

The name parameter is required and must be unique - two stages within the same group may not have the same name.

The method returns the stage object that was created, although most code will not need it directly. (Whenever you need to refer to a stage to add dependencies, you can use its name instead of a reference to the object.)

$group->add_deps

$group->add_deps(
    dep      => 'a_dep',                  ## one of these two
    deps     => ['dep1', 'dep2', ...],
    pre_req  => 'a_pre_req',              ## and one of these two
    pre_reqs => ['pre_req1', 'pre_req2', ...],
);

# A scalar value, either provided alone or in a list, can be
# any of:
#    - an existing stage object
#    - a string - the exact value of some existing stage's name
#    - a regexp - all of the stages whose name matches the regexp

The add_deps method marks the pre_req (or all of the pre_reqs) as being pre-requisites to the dep (or all of the deps). When the group is executed, stages may be run in parallel, but a dependent stage will not be permitted to start executing until all of its prerequisites stages have completed successfully.

It is permitted to list the same dependency multiple times. This can be convenient in that you do not need to be careful about providing non-overlapping groups when you specify sets of prerequisites.

So, you could write:

$group->add_deps( pre_req=>'stage1', deps=>[qw(stage2 stage3)] );
$group->add_deps( pre_reqs=>[qw(stage1 stage2)], dep=>'stage3' );

instead of:

$group->add_deps( pre_req=>'stage1', deps=>qr(^stage[23]$) );
$group->add_deps( pre_req=>'stage2', dep=>'stage3' );

or:

$group->add_deps( pre_req=>'stage1', dep=>'stage2' );
$group->add_deps( pre_req=>'stage2', dep=>'stage3' );

All three forms will provide the same ordering, the last is clearer for this simple sequence, but when there are many stages that have it may be easier to specify collections of dependencies at once.

However, you must be careful to avoid dependency loops. That would be a chain of dependencies stages that include the same stage multiple times (stage1 -> stage2 -> stage1). Since a dependency indicates that the prerequisite stage must be finished executing before the dependent stage can start executing, this loop would mean that the stage1 cannot start until stage2 has completed, but also that stage2 cannot start until stage1 has completed. So, neither one can ever start and they will both never complete.

Such a loop will eventually be detected, when the group has reached a point where there are no stages running, and no stages can be started - but there could have been a lot of time wasted executing stages that were not part of the loop before this is noticed and the run aborted.

Each stage argument passed can be either a reference to the stage object or the name of the stage, or a regexp that select all of the stages whose name matches the regexp. (If no stage name matches a regexp, then no stages are selected. This allows using a regexp to match against an optional stage without having to check whether that optional stage was actually used in this run. The downside is that a mistyped regexp will give no complaint when it matches nothing, but it is certainly not possibly to give a complaint if a mistyped regexp matches more stages than the user intended so checking the regexp carefully is necessary in any case.)

$group->execute

Execute the stages in the group. Does not return until all stages are complete (or have been skipped because of a failure of some other stage or the attempt is aborted).

AUTHOR

Christopher Lalansingh - Boutros Lab

John Macdonald - Boutros Lab

ACKNOWLEDGEMENTS

Paul Boutros, Phd, PI - Boutros http://www.omgubuntu.co.uk/2016/03/vineyard-wine-configuration-tool-linuxLab

The Ontario Institute for Cancer Research