NAME
Bio::FastaStream - Perl extension for Bioinformatics. Parsing sequence informations.
SYNOPSIS
use Bio::FastaStream;
my $fasta = '/path/to/file.fasta';
my $seq = Bio::FastaStream->new($fasta);
ABSTRACT
Bio::FastaStream is a perl module to parse information out off a Fasta-Sequence.
DESCRIPTION
This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:
accession number
description
sequence itself
length of sequence
crc64 checksum (as it is used by SWISS-PROT)
seq2xml
METHODS
new
getAccessionNr
my $accession = $seq->getAccessionNr();
returns the AccessionNr of the FASTA-Sequence
getDescription
my $description = $seq->getDescription();
returns the description standing in the first line of the FASTA-format (without the accession number)
getSequence
my $sequence = $seq->getSequence();
returns the sequence
getCrc64
my $crc64_checksum = $seq->getCrc64();
returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT
addDBRef
$seq->addDBRef(DB, REFERENCE_AC);
DB is the name of the referenced database
REFERENCE_AC is the accession number in the referenced database
seq2file
$seq->seq2file(FILENAME);
FILENAME is the path of the file where the sequence has to be stored.
allIndexesOf
my $indexes = $seq->allIndexesOf(EXPR);
returns a reference on an array, which contains all indexes of EXPR in the sequence
getSequenceLength
my $length = $seq->getSequenceLength();
returns the length of the sequence
getDBRefs
my $hashref = $seq->getDBRefs();
returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},
getFASTA
my $fasta_sequence = $seq->getFASTA();
returns the sequence in FASTA-format
EXAMPLE
use Bio::FastaStream;
my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
~;
my $seq = Bio::FastaStream->new($fasta);
while(my $obj = $streamobj->nextSeq()){
print $obj->getAccessionNr(),"\n",$obj->getCrc64(),"\n";
}
ADDITIONAL INFORMATION
accepted formats
This module can parse the following formats:
- >P02656 APC3_HUMAN Apolipoprotein C-III precursor (Apo-CIII).
- >IPI:IPI00166553|REFSEQ_XP:XP_290586|ENSEMBL:ENSP00000331094|TREMBL:Q8N3H0 T Hypothetical protein
- >sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
structure
The structure of the hash for the example is:
$VAR1 = {
'seq_length' => 120,
'accession_nr' => 'P01815',
'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
'crc64' => '158A8B29AE7EEB98',
'dbrefs' => {},
'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
}
if you miss something please contact me.
BUGS
There is no bug known. If you experienced any problems, please contact me.
SEE ALSO
http://modules.renee-baecker.de # not available yet - this site is under construction
the crc64-routine is based on the SWISS::CRC64 module.
MODIFICATIONS
More FASTA-Description lines are accepted.
AUTHOR
Renee Baecker, <module@renee-baecker.de>
feel free to contact me.
COPYRIGHT AND LICENSE
Copyright 2004 by Renee Baecker
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.