HPC::Runner::Command::submit_jobs::Utils::Scheduler

Command Line Options

#TODO Move this over to docs

config

Config file to pass to command line as --config /path/to/file. It should be a yaml or other config supplied by Config::Any This is optional. Paramaters can be passed straight to the command line

example.yml

---
infile: "/path/to/commands/testcommand.in"
outdir: "path/to/testdir"
module:
    - "R2"
    - "shared"

infile

infile of commands separated by newline. The usual bash convention of escaping a newline is also supported.

example.in

cmd1
#Multiline command
cmd2 --input --input \
--someotherinput
wait
#Wait tells slurm to make sure previous commands have exited with exit status 0.
cmd3  ##very heavy job
newnode
#cmd3 is a very heavy job so lets start the next job on a new node

jobname

Specify a job name, and jobs will be 001_jobname, 002_jobname, 003_jobname

Separating this out from Base - submit_jobs and execute_job have different ways of dealing with this

max_array_size

use_batches

The default is to submit using job arrays.

If specified it will submit each job individually.

Example:

#HPC jobname=gzip #HPC commands_per_node=1 gzip 1 gzip 2 gzip 3

Batches: sbatch 001_gzip.sh sbatch 002_gzip.sh sbatch 003_gzip.sh

Arrays:

sbatch --array=1-3 gzip.sh

afterok

The afterok switch in slurm. --afterok 123 will tell slurm to start this job after job 123 has completed successfully.

no_submit_to_slurm

Bool value whether or not to submit to slurm. If you are looking to debug your files, or this script you will want to set this to zero. Don't submit to slurm with --no_submit_to_slurm from the command line or $self->no_submit_to_slurm(0); within your code

DEPRECATED - use --dry_run instead

serial

Option to run all jobs serially, one after the other, no parallelism The default is to use 4 procs

use_custom

Supply your own command instead of mcerunner/threadsrunner/etc

Internal Attributes

scheduler_ids

Our current scheduler job dependencies

job_stats

Object describing the number of jobs, number of batches per job, etc

deps

Call as

#HPC deps=job01,job02

current_job

Keep track of our currently running job

current_batch

Keep track of our currently batch

template

template object for writing slurm batch submission script

cmd_counter

keep track of the number of commands - when we get to more than commands_per_node restart so we get submit to a new node. This is the number of commands within a batch. Each new batch resets it.

total_cmd_counter

batch_counter

Keep track of how many batches we have submited to slurm

job_counter

Keep track of how many jobes we have submited to slurm

batch

List of commands to submit to slurm

jobs

Contains all of our info for jobs

{
    job03 => {
        deps => ['job01', 'job02'],
        schedulerIds => ['123.hpc.inst.edu'],
        submitted => 1/0,
        batch => 'String of whole commands',
        cmds => [
            'cmd1',
            'cmd2',
        ]
    },
    schedule => ['job01', 'job02', 'job03']
}

graph_job_deps

Hashref of jobdeps to pass to Algorithm::Dependency

Job03 depends on job01 and job02

{ 'job03' => ['job01', 'job02'] }

Subroutines

Workflow

There are a lot of things happening here

parse_file_slurm #we also resolve the dependency tree and write out the batch files in here schedule_jobs iterate_schedule

for $job (@scheduled_jobs)
    (set current_job)
    process_jobs
    if !use_batches
        submit_job #submit the whole job is using job arrays - which is the default
    pre_process_batch
        (current_job, current_batch)
        scheduler_ids_by_batch
        if use_batches
            submit_job
        else
            run scontrol to update our jobs by job array id

run

check_jobname

Check to see if we the user has chosen the default jobname, 'job'

check_add_to_jobs

Make sure each jobname has an entry. We set the defaults as the global configuration.

increase_jobname

Increase jobname. job_001, job_002. Used for graph_job_deps

check_files

Check to make sure the outdir exists. If it doesn't exist the entire path will be created

iterate_schedule

Iterate over the schedule generated by schedule_jobs

iterate_job_deps

Check to see if we are actually submitting

Make sure each dep has already been submitted

Return job schedulerIds

process_jobs

pre_process_batch

Log info for the job to the screen

work

Process the batch Submit to the scheduler slurm/pbs/etc Take care of the counters

process_batch

Create the slurm submission script from the slurm template Write out template, submission job, and infile for parallel runner

post_process_batch_indexes

Put the scheduler_id in each batch

post_process_jobs