Introduction
Most bioinformatics workflows involve starting with a set of samples, and processing those samples in one or more steps. Also, most bioinformatics workflows are bash based, and no one wants to reinvent the wheel to rewrite their code in perl/python/whatever.
These docs are also available at http://jerowe.github.io/BioX-Workflow-Docs/showcase.html
Once you have your configuration all set, to process your entire workflow run
biox-workflow.pl --workflow workflow.yml > workflow.sh
Alternately, to select an exact rule
biox-workflow.pl --workflow workflow.yml --select_rules bowtie2 > rule1.sh
To match a set of rules using a regexp
#Matches all rules that contain 'gatk', including 'gatk_realign_indels', or 'rule_gatk'
biox-workflow.pl --workflow workflow.yml --match_rules gatk > gatk.sh
#Match only those rules beginning with gatk
biox-workflow.pl --workflow workflow.yml --match_rules "^gatk" > gatk.sh
InDepth
Samples
For example with our samples test1.vcf and test2.vcf, we want to bgzip and annotate using snpeff, and then parse the output using vcf-to-table.pl (shameless plug for BioX::Wrapper::Annovar).
BioX::Workflow assumes your have a set of inputs, known as samples, and these inputs will carry on through your pipeline. There are some exceptions to this, which we will explore with the resample option.
BioX::Workflow also assumes your samples are files or directories. They may also be people, frogs, or cells, but first and foremost they are files.
Structure
It also makes several assumptions about your output structure. It assumes you have each of your processes/rules outputting to a distinct directory. Each of the assumptions BioX::Workflow makes can be overridden either globally or locally. These directories will be created and automatically named based on your process name.
It also assumes the indir of each rule is the outdir of the previous rule.
All the things can be modified!
All the variables can be modified from their defaults in order to enable custom control of your workflow.