NAME
HPC::Runner::Slurm - Job Submission to Slurm
VERSION
Version 0.01
SYNOPSIS
Indepth documentation is at https://wcmc-q.atlassian.net/wiki/display/HPCSLURM/HPC-Runner-Slurm .
package Main;
extends 'HPC::Runner::Slurm';
Main->new_with_options(infile => '/path/to/commands');
This module is a wrapper around sbatch and can be used to submit arbirtary bash commands to slurm.
It has two levels of management. The first is the main sbatch command, and the second is the actual job, which runs commands in parallel, controlled by HPC::Runner::Threads or HPC::Runner::MCE.
It supports job dependencies. Put in the command 'wait' to tell slurm that some job or jobs depend on some other jobs completion. Put in the command 'newnode' to tell HPC::Runner::Slurm to submit the job to a new node.
The only necessary option is the --infile.
Submit Script
cmd1
cmd2 && cmd3
cmd4 \
--option cmd4 \
#Tell HPC::Runner::Slurm to put in some job dependencies.
wait
cmd5
#Tell HPC::Runner::Slurm to pass things off to a new node, but this job doesn't depend on the previous
newnode
cmd6
User Options
User options can be passed to the script with script --opt1 or in a configfile. It uses MooseX::SimpleConfig for the commands
configfile
Config file to pass to command line as --configfile /path/to/file. It should be a yaml or xml (untested) This is optional. Paramaters can be passed straight to the command line
example.yml
---
infile: "/path/to/commands/testcommand.in"
outdir: "path/to/testdir"
module:
- "R2"
- "shared"
infile
infile of commands separated by newline
example.in
cmd1
cmd2 --input --input \
--someotherinput
wait
#Wait tells slurm to make sure previous commands have exited with exit status 0.
cmd3 ##very heavy job
newnode
#cmd3 is a very heavy job so lets start the next job on a new node
module
modules to load with slurm Should use the same names used in 'module load'
Example. R2 becomes 'module load R2'
jobname
Specify a job name, and jobs will be jobname_1, jobname_2, jobname_x
afterok
The afterok switch in slurm. --afterok 123 will tell slurm to start this job after job 123 has completed successfully.
cpus_per_task
slurm item --cpus_per_task defaults to 8, which is probably fine
commands_per_node
--commands_per_node defaults to 8, which is probably fine
partition
#Should probably have something at some point that you can specify multiple partitions....
Specify the partition. Defaults to the partition that has the most nodes.
nodelist
Defaults to the nodes on the defq queue
submit_slurm
Bool value whether or not to submit to slurm. If you are looking to debug your files, or this script you will want to set this to zero. Don't submit to slurm with --nosubmit_to_slurm from the command line or $self->submit_to_slurm(0); within your code
template_file
actual template file
One is generated here for you, but you can always supply your own with --template_file /path/to/template
serial Option to run all jobs serially, one after the other, no parallelism The default is to use 4 procs
user
user running the script. Passed to slurm for mail information
use_threads
Bool value to indicate whether or not to use threads. Default is uses processes
If using threads your perl must be compiled to use threads!
use_processes
Bool value to indicate whether or not to use processes. Default is uses processes
Internal Variables
You should not need to mess with any of these.
template
template object for writing slurm batch submission script
cmd_counter
keep track of the number of commands - when we get to more than commands_per_node restart so we get submit to a new node.
node_counter
Keep track of which node we are on
batch_counter
Keep track of how many batches we have submited to slurm
node
Node we are running on
cmd
Current command specified by infile
batch
List of commands to submit to slurm
cmdfile
File of commands for mcerunner/parallelrunner Is cleared at the end of each slurm submission
slurmfile
File generated from slurm template
jobref
Array of arrays details slurmjob id. Index -1 is the most recent job submissisions, and there will be an index -2 if there are any job dependencies
wait
Boolean value indicates any job dependencies
SUBROUTINES/METHODS
run()
First sub called Calling system module load * does not work within a screen session!
check_files()
Check to make sure the outdir exists. If it doesn't exist the entire path will be created
get_nodes
Get the nodes from sinfo if not supplied
If the nodelist is supplied partition must be supplied
parse_file_slurm
Parse the file looking for the following conditions
lines ending in `\` wait nextnode
Batch commands in groups of $self->cpus_per_task, or smaller as wait and nextnode indicate
work
Get the node #may be removed but we'll try it out Process the batch Submit to slurm Take care of the counters
process_batch()
Create the slurm submission script from the slurm template Write out template, submission job, and infile for parallel runner
submit_slurm()
Submit jobs to slurm queue using sbatch.
This subroutine was just about 100% from the following perlmonks discussions. All that I did was add in some logging.
http://www.perlmonks.org/?node_id=151886 You can use the script at the top to test the runner. Just download it, make it executable, and put it in the infile as
perl command.pl 1 perl command.pl 2 #so on and so forth
AUTHOR
Jillian Rowe, <jillian.e.rowe at gmail.com>
BUGS
Please report any bugs or feature requests to bug-runner-init at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HPC-Runner-Slurmm. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc HPC::Runner::Slurm
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
This module was originally developed at and for Weill Cornell Medical College in Qatar. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.
LICENSE AND COPYRIGHT
Copyright 2014 Jillian Rowe.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:
http://www.perlfoundation.org/artistic_license_2_0
Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.
If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.
This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.
Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.