NAME

Bio::FastaStream - Perl extension for Bioinformatics. Parsing sequence informations.

SYNOPSIS

use Bio::FastaStream;
my $fasta = '/path/to/file.fasta';
my $seq = Bio::FastaStream->new($fasta);

ABSTRACT

Bio::FastaStream is a perl module to parse information out off a Fasta-Sequence.

DESCRIPTION

This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:

  • accession number

  • description

  • sequence itself

  • length of sequence

  • crc64 checksum (as it is used by SWISS-PROT)

  • seq2xml

METHODS

new

getAccessionNr

my $accession = $seq->getAccessionNr();

returns the AccessionNr of the FASTA-Sequence

getDescription

my $description = $seq->getDescription();

returns the description standing in the first line of the FASTA-format (without the accession number)

getSequence

my $sequence = $seq->getSequence();

returns the sequence

getCrc64

my $crc64_checksum = $seq->getCrc64();

returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT

addDBRef

$seq->addDBRef(DB, REFERENCE_AC);

DB is the name of the referenced database

REFERENCE_AC is the accession number in the referenced database

seq2file

$seq->seq2file(FILENAME);

FILENAME is the path of the file where the sequence has to be stored.

allIndexesOf

my $indexes = $seq->allIndexesOf(EXPR);

returns a reference on an array, which contains all indexes of EXPR in the sequence

getSequenceLength

my $length = $seq->getSequenceLength();

returns the length of the sequence

getDBRefs

my $hashref = $seq->getDBRefs();

returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},

getFASTA

my $fasta_sequence = $seq->getFASTA();

returns the sequence in FASTA-format

EXAMPLE

	use Bio::FastaStream;
	my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
	QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
	YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
	~;

	my $seq = Bio::FastaStream->new($fasta);

	while(my $obj = $streamobj->nextSeq()){
          print $obj->getAccessionNr(),"\n",$obj->getCrc64(),"\n";
        }

ADDITIONAL INFORMATION

accepted formats

This module can parse the following formats:

>P02656 APC3_HUMAN Apolipoprotein C-III precursor (Apo-CIII).
>IPI:IPI00166553|REFSEQ_XP:XP_290586|ENSEMBL:ENSP00000331094|TREMBL:Q8N3H0 T Hypothetical protein
>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).

structure

The structure of the hash for the example is:

$VAR1 = {
         'seq_length' => 120,
         'accession_nr' => 'P01815',
         'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
         'crc64' => '158A8B29AE7EEB98',
         'dbrefs' => {},
         'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
       }

if you miss something please contact me.

BUGS

There is no bug known. If you experienced any problems, please contact me.

SEE ALSO

http://modules.renee-baecker.de # not available yet - this site is under construction

the crc64-routine is based on the SWISS::CRC64 module.

MODIFICATIONS

More FASTA-Description lines are accepted.

AUTHOR

Renee Baecker, <module@renee-baecker.de>

feel free to contact me.

COPYRIGHT AND LICENSE

Copyright 2004 by Renee Baecker

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.