NAME

Bio::Lite - Lightweight and fast module with a simplified API to ease scripting in bioinformatics

VERSION

version 0.002

SYNOPSIS

# Reverse complementing a sequence
my $seq = reverseComplemente("ATGC");

# Reading a FASTQ file
my $it = seqFileIterator('file.fastq','fastq');
while(my $entry = $it->()) {
  print "Sequence name   : $entry->{name}
         Sequence        : $entry->{seq}
         Sequence quality: $entry->{qual}","\n";
}

# Reading paired-end files easier
my $it = pairedEndSeqFileIterator($file);
while (my $entry = $it->()) {
  print "Read_1 : $entry->{read1}->{seq}
         Read_2 : $entry->{read2}->{seq}";
}

# Parsing a GFF file
my $it = gffFileIterator($file);
while (my $annot = $it->()) {
  print "chr    : $annot->{chr}
         start  : $annot->{start}
         end    : $annot->{end}";
}

DESCRIPTION

Bio::Lite is a set of subroutines that aims to answer similar questions as Bio-perl distribution in a FAST and SIMPLE way.

Bio::Lite does not make use of complexe data struture, or objects, that would lead to a slow execution.

All methods can be imported with a single "use Bio::Lite".

Bio::Lite is a lightweight-single-module with NO DEPENDENCIES.

UTILS

reverseComplemente

Reverse complemente the (nucleotid) sequence in arguement.

Example:

my $seq_revcomp = reverseComplemente($seq);

reverseComplemente is more than 100x faster than Bio-Perl revcom_as_string()

PARSING

This are some tools that aim to read (bio) files like

Sequence files : FASTA, FASTQ
Annotation files : GFF3, GTF2, BED6, BED12, ...
Alignement files : SAM, BAM

seqFileIterator

Open Fasta, or Fastq files (can be gziped). seqFileIterator has an automatic file extension detection but you can force it using a second parameter with the format : 'fasta' or 'fastq'.

Example:

my $it = seqFileIterator('file.fastq','fastq');
while(my $entry = $it->()) {
  print "Sequence name   : $entry->{name}
         Sequence        : $entry->{seq}
         Sequence quality: $entry->{qual}","\n";
}

Return: HashRef

{ name => 'sequence_identifier',
  seq  => 'sequence_value',
  qual => 'sequence_quality', # only defined for FASTQ files
}

seqFileIterator is more than 50x faster than Bio-Perl Bio::SeqIO for FASTQ files seqFileIterator is 4x faster than Bio-Perl Bio::SeqIO for FASTA files

pairedEndSeqFileIterator

Open Paired-End Sequence files using seqFileIterator()

Paird-End files are generated by Next Generation Sequencing technologies (like Illumina) where two reads are sequenced from the same DNA fragment and saved in separated files.

Example:

my $it = pairedEndSeqFileIterator($file);
while (my $entry = $it->()) {
  print "Read_1 : $entry->{read1}->{seq}
         Read_2 : $entry->{read2}->{seq}";
}

Return: HashRef

{ read1 => 'see seqFileIterator() return',
  read2 => 'see seqFileIterator() return'
}

pairedEndSeqFileIterator has no equivalent in Bio-Perl

gffFileIterator

manage GFF3 and GTF2 file format

Example:

my $it = gffFileIterator($file);
while (my $annot = $it->()) {
  print "chr    : $annot->{chr}
         start  : $annot->{start}
         end    : $annot->{end}";
}

Return a hashref with the annotation parsed:

{ chr => ...,
  source => ...,
  feature => ...,
  start => ...,
  end => ...,
  strand ...,
  frame ...,
  attributes => { id => val, ...}
}

gffFileIterator is 5x faster than Bio-Perl Bio::Tools::GFF

FILES IO

getReadingFileHandle

Return a file handle for the file in argument. Display errors if file cannot be oppenned and manage gzipped files (based on .gz file extension)

Example:

my $fh = getReadingFileHandle('file.txt.gz');
while(<$fh>) {
  print $_;
}
close $fh;

getWritingFileHandle

Return a file handle for the file in argument. Display errors if file cannot be oppenned and manage gzipped files (based on .gz file extension)

Example:

my $fh = getWritingFileHandle('file.txt.gz');
print $fh "Hello world\n";
close $fh;

TODO

AUTHOR

Jérôme Audoux <jerome.audoux@gmail.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2014 by Jérôme Audoux.

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007