HPC::Runner::Command::submit_jobs::Utils::Scheduler
Command Line Options
#TODO Move this over to docs
config
Config file to pass to command line as --config /path/to/file. It should be a yaml or other config supplied by Config::Any This is optional. Paramaters can be passed straight to the command line
example.yml
---
infile: "/path/to/commands/testcommand.in"
outdir: "path/to/testdir"
module:
- "R2"
- "shared"
infile
infile of commands separated by newline. The usual bash convention of escaping a newline is also supported.
example.in
cmd1
#Multiline command
cmd2 --input --input \
--someotherinput
wait
#Wait tells slurm to make sure previous commands have exited with exit status 0.
cmd3 ##very heavy job
newnode
#cmd3 is a very heavy job so lets start the next job on a new node
jobname
Specify a job name, and jobs will be 001_jobname, 002_jobname, 003_jobname
Separating this out from Base - submit_jobs and execute_job have different ways of dealing with this
max_array_size
use_batches
The default is to submit using job arrays.
If specified it will submit each job individually.
Example:
#HPC jobname=gzip #HPC commands_per_node=1 gzip 1 gzip 2 gzip 3
Batches: sbatch 001_gzip.sh sbatch 002_gzip.sh sbatch 003_gzip.sh
Arrays:
sbatch --array=1-3 gzip.sh
afterok
The afterok switch in slurm. --afterok 123 will tell slurm to start this job after job 123 has completed successfully.
no_submit_to_slurm
Bool value whether or not to submit to slurm. If you are looking to debug your files, or this script you will want to set this to zero. Don't submit to slurm with --no_submit_to_slurm from the command line or $self->no_submit_to_slurm(0); within your code
DEPRECATED - use --dry_run instead
serial
Option to run all jobs serially, one after the other, no parallelism The default is to use 4 procs
use_custom
Supply your own command instead of mcerunner/threadsrunner/etc
Internal Attributes
scheduler_ids
Our current scheduler job dependencies
job_stats
Object describing the number of jobs, number of batches per job, etc
deps
Call as
#HPC deps=job01,job02
current_job
Keep track of our currently running job
current_batch
Keep track of our currently batch
template
template object for writing slurm batch submission script
cmd_counter
keep track of the number of commands - when we get to more than commands_per_node restart so we get submit to a new node. This is the number of commands within a batch. Each new batch resets it.
total_cmd_counter
batch_counter
Keep track of how many batches we have submited to slurm
job_counter
Keep track of how many jobes we have submited to slurm
batch
List of commands to submit to slurm
jobs
Contains all of our info for jobs
{
job03 => {
deps => ['job01', 'job02'],
schedulerIds => ['123.hpc.inst.edu'],
submitted => 1/0,
batch => 'String of whole commands',
cmds => [
'cmd1',
'cmd2',
]
},
schedule => ['job01', 'job02', 'job03']
}
graph_job_deps
Hashref of jobdeps to pass to Algorithm::Dependency
Job03 depends on job01 and job02
{ 'job03' => ['job01', 'job02'] }
Subroutines
Workflow
There are a lot of things happening here
parse_file_slurm #we also resolve the dependency tree and write out the batch files in here schedule_jobs iterate_schedule
for $job (@scheduled_jobs)
(set current_job)
process_jobs
if !use_batches
submit_job #submit the whole job is using job arrays - which is the default
pre_process_batch
(current_job, current_batch)
scheduler_ids_by_batch
if use_batches
submit_job
else
run scontrol to update our jobs by job array id
run
check_jobname
Check to see if we the user has chosen the default jobname, 'job'
check_add_to_jobs
Make sure each jobname has an entry. We set the defaults as the global configuration.
increase_jobname
Increase jobname. job_001, job_002. Used for graph_job_deps
check_files
Check to make sure the outdir exists. If it doesn't exist the entire path will be created
iterate_schedule
Iterate over the schedule generated by schedule_jobs
iterate_job_deps
Check to see if we are actually submitting
Make sure each dep has already been submitted
Return job schedulerIds
process_jobs
pre_process_batch
Log info for the job to the screen
work
Process the batch Submit to the scheduler slurm/pbs/etc Take care of the counters
process_batch
Create the slurm submission script from the slurm template Write out template, submission job, and infile for parallel runner
post_process_batch_indexes
Put the scheduler_id in each batch