BioX::Wrapper::Gemini

Wrapper around Gemini for processing files

Attributes

Moose Attributes

vcfs

VCF files can be given individually as well.

#Option is an ArrayRef and can be given as either

--vcfs 1.vcf,2.vcf,3.vcfs

#or

--vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf

Don't mix the methods

If these vcfs are uncompressed, they will be compressed in place. Please make sure either this location has read/write access, or create a symbolic link to someplace

Everytime you leave genomics data uncompressed a kitten dies!

uncomvcfs

Vcfs that are uncompressed

ref

Supply a path to a reference genome

Default is to assume there is an environmental variable $REFGENOME

snpeff

Base directory of snpeff

The default assumes there is an environmental variable of $SNPEFF, being the base directory of the snpeff installation.

snpeff_opt

Options to run snpeff with

ped

If all vcf files are being loaded into the gemini db with the same pedigree file, simply change the --db_load_opts to correspond to your file.

If each vcf file has its own pedigree, make sure the pedigree file matches the basename of the vcf.

Basenames are captured like so:

my @gzipbase = map {  basename($_, ".vcf.gz") }  @gzipped ;
my @notgzipbase = map {  basename($_, ".vcf") }  @notgzipped ;

With the extension being .vcf.gz/.vcf

Invoke this with --ped

Exact specifications should be found here:

http://gemini.readthedocs.org/en/latest/content/preprocessing.html#describing-samples-with-a-ped-file

ped_dir

If using the --ped option you must specify this if your pedigree files are not in the same directory as the --indir option

db_load_opts

Options for loading VCF file into gemini sqlite db

Default is --skip_cadd -t snpEff

Subroutines

Subroutines

check_files

Check to make sure either an indir or vcfs are supplied

find_vcfs

Use File::Find::Rule to find the vcfs

Make sure they are all gzipped first. If there are any .vcf$ files without a corresponding .vcf.gz$, bgzip those

bgzip

Run bgzip command on files found in find_vcfs

norml

normalize vcfs using vt and annotate using SNPEFF

db_load

Load DB into gemini

run

Subroutine that starts everything off

NAME

BioX::Wrapper::Gemini - A simple wrapper around the python Gemini library for annotating VCF files.

SYNOPSIS

Basic Usage

gemini_wrapper.pl --indir /path/to/vcfs --outdir /location/we/can/write/to

Using the API

BioX::Wrapper::Gemini is written using Moose and can be extended in all the usual fashions.

use BioX::Wrapper::Gemini;

after 'db_load' =>
sub {
my $self = shift;
  # Run some commands
  # SCIENCE!
}

DESCRIPTION

BioX::Wrapper::Gemini is a simple wrapper around the python Gemini library for annotating VCF files.

AUTHOR

Jillian Rowe <jillian.e.rowe@gmail.com>

COPYRIGHT

Copyright 2015- Jillian Rowe

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO