NAME
Bio::Lite - Lightweight and fast module with a simplified API to ease scripting in bioinformatics
VERSION
version 0.002
SYNOPSIS
# Reverse complementing a sequence
my $seq = reverseComplemente("ATGC");
# Reading a FASTQ file
my $it = seqFileIterator('file.fastq','fastq');
while(my $entry = $it->()) {
print "Sequence name : $entry->{name}
Sequence : $entry->{seq}
Sequence quality: $entry->{qual}","\n";
}
# Reading paired-end files easier
my $it = pairedEndSeqFileIterator($file);
while (my $entry = $it->()) {
print "Read_1 : $entry->{read1}->{seq}
Read_2 : $entry->{read2}->{seq}";
}
# Parsing a GFF file
my $it = gffFileIterator($file);
while (my $annot = $it->()) {
print "chr : $annot->{chr}
start : $annot->{start}
end : $annot->{end}";
}
DESCRIPTION
Bio::Lite is a set of subroutines that aims to answer similar questions as Bio-perl distribution in a FAST and SIMPLE way.
Bio::Lite does not make use of complexe data struture, or objects, that would lead to a slow execution.
All methods can be imported with a single "use Bio::Lite".
Bio::Lite is a lightweight-single-module with NO DEPENDENCIES.
UTILS
reverseComplemente
Reverse complemente the (nucleotid) sequence in arguement.
Example:
my $seq_revcomp = reverseComplemente($seq);
reverseComplemente is more than 100x faster than Bio-Perl revcom_as_string()
PARSING
This are some tools that aim to read (bio) files like
- Sequence files : FASTA, FASTQ
- Annotation files : GFF3, GTF2, BED6, BED12, ...
- Alignement files : SAM, BAM
seqFileIterator
Open Fasta, or Fastq files (can be gziped). seqFileIterator has an automatic file extension detection but you can force it using a second parameter with the format : 'fasta' or 'fastq'.
Example:
my $it = seqFileIterator('file.fastq','fastq');
while(my $entry = $it->()) {
print "Sequence name : $entry->{name}
Sequence : $entry->{seq}
Sequence quality: $entry->{qual}","\n";
}
Return: HashRef
{ name => 'sequence_identifier',
seq => 'sequence_value',
qual => 'sequence_quality', # only defined for FASTQ files
}
seqFileIterator is more than 50x faster than Bio-Perl Bio::SeqIO for FASTQ files seqFileIterator is 4x faster than Bio-Perl Bio::SeqIO for FASTA files
pairedEndSeqFileIterator
Open Paired-End Sequence files using seqFileIterator()
Paird-End files are generated by Next Generation Sequencing technologies (like Illumina) where two reads are sequenced from the same DNA fragment and saved in separated files.
Example:
my $it = pairedEndSeqFileIterator($file);
while (my $entry = $it->()) {
print "Read_1 : $entry->{read1}->{seq}
Read_2 : $entry->{read2}->{seq}";
}
Return: HashRef
{ read1 => 'see seqFileIterator() return',
read2 => 'see seqFileIterator() return'
}
pairedEndSeqFileIterator has no equivalent in Bio-Perl
gffFileIterator
manage GFF3 and GTF2 file format
Example:
my $it = gffFileIterator($file);
while (my $annot = $it->()) {
print "chr : $annot->{chr}
start : $annot->{start}
end : $annot->{end}";
}
Return a hashref with the annotation parsed:
{ chr => ...,
source => ...,
feature => ...,
start => ...,
end => ...,
strand ...,
frame ...,
attributes => { id => val, ...}
}
gffFileIterator is 5x faster than Bio-Perl Bio::Tools::GFF
FILES IO
getReadingFileHandle
Return a file handle for the file in argument. Display errors if file cannot be oppenned and manage gzipped files (based on .gz file extension)
Example:
my $fh = getReadingFileHandle('file.txt.gz');
while(<$fh>) {
print $_;
}
close $fh;
getWritingFileHandle
Return a file handle for the file in argument. Display errors if file cannot be oppenned and manage gzipped files (based on .gz file extension)
Example:
my $fh = getWritingFileHandle('file.txt.gz');
print $fh "Hello world\n";
close $fh;
TODO
AUTHOR
Jérôme Audoux <jerome.audoux@gmail.com>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2014 by Jérôme Audoux.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007