BioX::Wrapper::Gemini
Wrapper around Gemini for processing files
Attributes
Moose Attributes
vcfs
VCF files can be given individually as well.
#Option is an ArrayRef and can be given as either
--vcfs 1.vcf,2.vcf,3.vcfs
#or
--vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf
Don't mix the methods
If these vcfs are uncompressed, they will be compressed in place. Please make sure either this location has read/write access, or create a symbolic link to someplace
Everytime you leave genomics data uncompressed a kitten dies!
uncomvcfs
Vcfs that are uncompressed
ref
Supply a path to a reference genome
Default is to assume there is an environmental variable $REFGENOME
snpeff
Base directory of snpeff
The default assumes there is an environmental variable of $SNPEFF, being the base directory of the snpeff installation.
snpeff_opt
Options to run snpeff with
ped
If all vcf files are being loaded into the gemini db with the same pedigree file, simply change the --db_load_opts to correspond to your file.
If each vcf file has its own pedigree, make sure the pedigree file matches the basename of the vcf.
Basenames are captured like so:
my @gzipbase = map { basename($_, ".vcf.gz") } @gzipped ;
my @notgzipbase = map { basename($_, ".vcf") } @notgzipped ;
With the extension being .vcf.gz/.vcf
Invoke this with --ped
Exact specifications should be found here:
http://gemini.readthedocs.org/en/latest/content/preprocessing.html#describing-samples-with-a-ped-file
ped_dir
If using the --ped option you must specify this if your pedigree files are not in the same directory as the --indir option
db_load_opts
Options for loading VCF file into gemini sqlite db
Default is --skip_cadd -t snpEff
Subroutines
Subroutines
check_files
Check to make sure either an indir or vcfs are supplied
find_vcfs
Use File::Find::Rule to find the vcfs
Make sure they are all gzipped first. If there are any .vcf$ files without a corresponding .vcf.gz$, bgzip those
bgzip
Run bgzip command on files found in find_vcfs
norml
normalize vcfs using vt and annotate using SNPEFF
db_load
Load DB into gemini
run
Subroutine that starts everything off
NAME
BioX::Wrapper::Gemini - A simple wrapper around the python Gemini library for annotating VCF files.
SYNOPSIS
Basic Usage
gemini_wrapper.pl --indir /path/to/vcfs --outdir /location/we/can/write/to
Using the API
BioX::Wrapper::Gemini is written using Moose and can be extended in all the usual fashions.
use BioX::Wrapper::Gemini;
after 'db_load' =>
sub {
my $self = shift;
# Run some commands
# SCIENCE!
}
DESCRIPTION
BioX::Wrapper::Gemini is a simple wrapper around the python Gemini library for annotating VCF files.
AUTHOR
Jillian Rowe <jillian.e.rowe@gmail.com>
COPYRIGHT
Copyright 2015- Jillian Rowe
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.