NAME
FASTASequence - Perl extension Biooinformatics
SYNOPSIS
use FASTASequence;
my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
~;
my $seq = FASTASequence->new($fasta);
ABSTRACT
This should be the abstract for FASTASequence.
The abstract is used when making PPD (Perl Package Description) files.
If you don't want an ABSTRACT you should also edit Makefile.PL to
remove the ABSTRACT_FROM option.
DESCRIPTION
This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:
accession number
description
sequence itself
length of sequence
crc64 checksum (as it is used by SWISS-PROT)
METHODS
new
getAccessionNr
my $accession = $seq->getAccessionNr();
returns the AccessionNr of the FASTA-Sequence
getDescription
my $description = $seq->getDescription();
returns the description standing in the first line of the FASTA-format (without the accession number)
getSequence
my $sequence = $seq->getSequence();
returns the sequence
getCrc64
my $crc64_checksum = $seq->getCrc64();
returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT
addDBRef
$seq->addDBRef(DB, REFERENCE_AC);
DB is the name of the referenced database
REFERENCE_AC is the accession number in the referenced database
seq2file
$seq->seq2file(FILENAME, OPTIONS);
FILENAME is the path of the file where the sequence has to be stored.
OPTIONS is a hash, which contains the options:
allIndexesOf
my $indexes = $seq->allIndexesOf(EXPR);
returns a reference on an array, which contains all indexes of EXPR in the sequence
getSequenceLength
my $length = $seq->getSequenceLength();
returns the length of the sequence
getDBRefs
my $hashref = $seq->getDBRefs();
returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},
getFASTA
my $fasta_sequence = $seq->getFASTA();
returns the sequence in FASTA-format
EXAMPLE
use FASTASequence;
my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
~;
my $seq = FASTASequence->new($fasta);
print 'The sequence of '.$seq->getAccessionNr().' is '.$seq->getSequence(),"\n";
print 'This sequence contains '.scalar($seq->allIndexesOf('C').' times Cystein at the following positions:';
print $_+1.', ' for(@{$seq->allIndexesOf('C')});
ADDITIONAL INFORMATION
accepted formats
This module can parse the following formats:
- >P02656 APC3_HUMAN Apolipoprotein C-III precursor (Apo-CIII).
- >IPI:IPI00166553|REFSEQ_XP:XP_290586|ENSEMBL:ENSP00000331094|TREMBL:Q8N3H0 T Hypothetical protein
- >sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
structure
The structure of the hash for the example is:
$VAR1 = {
'seq_length' => 120,
'accession_nr' => 'P01815',
'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
'crc64' => '158A8B29AE7EEB98',
'dbrefs' => {},
'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
}
if you miss something please contact me.
BUGS
There is no bug known. If you experienced any problems, please contact me.
SEE ALSO
http://perl-modules.renee-baecker.de
the crc64-routine is based on the SWISS::CRC64 module.
AUTHOR
Renee Baecker, <module@renee-baecker.de>
COPYRIGHT AND LICENSE
Copyright 2004 by Renee Baecker
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.