NAME

HPCI::Sub

SYNOPSIS

Role for methods and attributes common to subgroup and stage classes (i.e. those subordinate to a group-like class).

ATTRIBUTES

cluster (internally provided)

The type of cluster that will be used to execute the subgroup or stage. This value is passed on by the $group->stage or $group->subgroup method when it creates a new child. Since it also uses that value to select the type of stage object that is created, it is somewhat redundant.

group

The group or subgroup object that has this object as a child.

The group that this object belongs to is automatically provided to initialize this attribute. You don't need to initialize it explicitly, and since its use is expected to be internal, you won't have much, if any, need to use it either.

name

The name of this subgroup or stage.

All stages and subgroups must have names that are different from each other, from all of their (grand)parent groups and from all of the siblings of their (grand)parent groups. Stages or subgroups may have the same name only if they are at most cousins.

base_dir (optional) move stuff from Role to here

The directory that will contain all generated output (unless that output is specifically directed to some other location). The default is the current directory.

files

A hash that can contain lists of files.

Throughout this hash, there are filenames contained within hash elements that describe the processing required for that file. Whenever a filename is needed, it can either be a string containing a pathname, or it can be an HPCI::File object (or subclass), or it can be a HashRef. Often, it will be the string form, which will be converted to an object internally.

The top level of the hash has keys 'in' (for input), 'out' (for output), 'skip', 'skipstage', 'rename', and 'delete'. (The same file might be listed under multiple keys.)

The values for these keys are:

'in'

a hashref with possible keys:

'req' (for required input files)
'opt' (for required output files)

The value for either of these can be either a filename or a list of filenames.

When a subgroup or stage containing any files/in entries is ready to be executed, HPCI checks whether the listed files exist. If one that is req does not exist, the stage or subgroup is aborted. Any files that do exist can get additional validation as specified in the associated HPCI::File subclass for the file - for example, checksum files can be generated or verified.

'out'

a hashref with possible keys:

'req' (for required output files)
'opt' (for required output files)

The value for either of these can be either a filename or a list of filenames.

When a subgroup or stage containing any files/out entries completes execution, HPCI checks whether the listed files have been created or modified. If one that is req does not exist or has not been modified, the stage or subgroup is aborted. Any files that were created by this stage or subgroup can get additional processing as specified in the associated HPCI::File subclass for the file - for example, checksum files can be generated.

'skip' (or, deprecated, 'skipstage')

The traditional name 'skipstage' for this element has been deprecated and has been replaced by the new name 'skip'. The old name became confusing when subgroups were added and could also contain a files list that applied to the entire subgroup. The 'skip' list defines the contitions under which the stage or subgroup can be skipped. This will be useful in a restart of an HPCI program that previously completed some stages - the 'skip' elements allow HPCI to determine whether this subgrou or stage completed successfully on the earlier run and hance does not need to be re-executed.

It contains either:

  • an arrayref

  • a hashref with the keys 'pre' (for pre-requisites) and 'lists'

The arrayref (either the arrayref value of 'skip' or the arrayref value for the 'lists' hash element) can contain either a list of files, or a hashref with keys 'pre' and 'files'.

The 'pre' value (if present) at the top level is a list of files which are pre-requisites for all of the lists. If a list has its own 'pre' list, those files are only pre-requities for the files in that list.

'rename'
a list of pairs of filenames

The file named as the first element in each pair (if it exists) is renamed to the second filename in the pair. It is not considered an error for the first file in a pair to not exist - if you want to ensure that a file exists, include it as an 'out'->'req' file as well.

'delete'

can be either:

a scalar filename
a list of filenames

These will be removed if the stage completes successfully. It is not considered an error if any of these files does not exist - include them in the 'out'->'req' files list if you wish to ensure that they do.

The contents are used at various times:

the stage/subgroup is ready to be executed
  • if a 'skip' key is present then checking is done to decide whether the stage or subgroup needs to be executed or can be skipped (treating it as a successful completion)

    the main content of this key is a list of lists of filenames (the target files) - if any of these lists has all of its files existing, then the stage can be skipped

    if there is a top level and/or a list level 'pre' list, then all of the files in the pre list(s) must also exist and be older than the target files (the files in the top level 'pre' list are checked against all of the target lists, the files in a target level 'pre' list are only checked against that target).

    skip checking is always done by the parent process, in hopes of avoiding the need to create the stage.

  • all 'in'->'req' files must exist, if any is missing, the stage is aborted. If the files exist, then the child stage will be set up (if needed) to download those files from the long-term storage.

  • all 'in'->'opt' are checked by the parent. If any exists, then the child stage will be set up (if needed) to download them from the long-term storage.

  • Any additional processing of 'in' files that are present (such as validating or auto-generating checksums) is also done at this time, and may also cause the stage or subgroup to be aborted.

the stage or subgroup has completed execution
  • all 'out'->'req' files must exist and they must have been updated during the execution of the stage (otherwise the stage is treated as failing)

  • any 'out'->'opt' files which exists must have been updated during the execution of the stage (otherwise the stage is treated as failing)

  • any additional processing of 'out' files which have been updated (such as generating a checksum) is also done

  • clusters that require special treatment of files can take copying actions to collect any 'out' files that have been updated and return them to the original node

  • if the stage completed successfully, any files lists as 'rename' are renamed to their new name

  • if the stage completed successfully, any files lists as 'delete' are removed

connect (optional)

This can contain an URL to be used by the driver for types of cluster where it is necessary to connect to the cluster in some way. It can be omitted for local clusters that are directly accessible.

login, password (optional)

This can contain an identifier to be used by the driver for types of cluster which require authorization.

max_concurrent (optional)

The maximum number of stages to be running concurrently. If 0 (which is the default), then there is no limit applied directly by HPCI (although the underlying cluster-specific driver might apply limits of its own).

file_class (internal)

The default storage class attribute for files that do not have an explicit class given. This is the name of a class. The default is to use the value from the parent (sub-)group.

METHODS

$group->add_file_params

Augment the file_params list with additional files. Provide either a hashref or a list of value pairs, in either case, the pairs are filename as the key, and params as the value.

AUTHOR

Christopher Lalansingh - Boutros Lab

John Macdonald - Boutros Lab

ACKNOWLEDGEMENTS

Paul Boutros, Phd, PI - Boutros http://www.omgubuntu.co.uk/2016/03/vineyard-wine-configuration-tool-linuxLab

The Ontario Institute for Cancer Research