NAME

Bio::Seq - bioperl sequence object

SYNOPSIS

Object Creation

$seq = Bio::Seq->new;

$seq = Bio::Seq->new(-seq=>'ACTGTGGCGTCAACTG');

$seq = Bio::Seq->new(-seq=>$sequence_string);

$seq = Bio::Seq->new(-seq=>@character_list);


$seq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                    $numbering,$type,$ffmt,$descffmt);

Object Creation from files

There are two ways to create Bio::Seq objects from files. One is using internal Sequence reading routines in this object, which can handle a few formats. The second is to use the newer SeqIO system, which can handle slightly more formats, can handle multiple sequences in one file, and can be easily extended to new formats.

Try to use the new style. It does give you more flexibility and stability.

  # old-style and deprecated,

  $seq = Bio::Seq->new($filename); # guesses Fasta format 

  $seq = Bio::Seq->new(-file=>'seqfile.aa',
		      -desc=>'Sample Bio::Seq sequence',
		      -start=>'1',
                      -ffmt=> 'Fasta',
		      -type=>'Amino',
		      );

  # new style, better, but somewhat more wordy
  # notice this loops over multiple sequences

  $stream = Bio::SeqIO->new(-file => 'myfile' -fmt => 'Fasta');

  while $seq ( $stream->next_seq() ) {
       # $seq is a Bio::Seq object
  }

Object Manipulation

$seq->[METHOD];

$result = $seq->[METHOD];



Accessors
--------------------------------------------------------
There are a wide variety of methods designed to give easy
and flexible access to the contents of sequence objects

The following accessors can be invoked upon a sequence object

ary()        - access sequence (or slice of sequence) as an array
str()        - access sequence (or slice of sequence) as a string
getseq()     - access sequence (or slice) as string or array
seq_len()    - access sequence length
id()         - access/change object id 
desc()       - access/change object description
names()      - access/change object names
start()      - access/change start point of the sequence (see note below) 
end()        - access/change end point of the sequence (see note below)
numbering()  - access/change sequence numbering offset (deprecated)
origin()     - access/change sequence origin
type()       - access/change sequence type
setseq()     - change sequence

Deprecated format changes.

ffmt()       - access/change default output format
descffmt()   - access/change description format


Methods
--------------------------------------------------------
The following methods can be invoked upon a sequence object

copy()        - returns an exact copy of an object
alphabet_ok() - check sequence against genetic alphabet  
alphabet()    - returns the genetic alphabet currently in use
layout()      - sequence formatter for output
revcom()      - reverse complement of sequence
complement()  - complement of sequence  
reverse()     - reverse of sequence
Dna_to_Rna()  - translate Dna seq to Rna
Rna_to_Dna()  - translate Rna seq to Dna
translate()   - protein translation of Dna/Rna sequence

 
copy, revcom and translate all return new Bio::Seq objects. This
makes it easy to use these objects in other Bioperl modules and/or
use all the new SeqIO system for format dumping.

complement, reverse, Dna_to_Rna and Rna_to_Dna all return strings,
as it is less likely that you want these things as real Seq objects

OBJECT IN TRANSITION

The Bio::Seq object is by far the oldest object in the bioperl set of modules, and it shows, with around 4/5 people developing methods and much of the documentation focused on general bioperl issues. The bioperl core group have a commitment to eventually rewrite the Bio::Seq object with some more sensible design principles, but this rewrite will

a) be heavily tested against old uses of the code
b) aim to be as backwardly compatible as possible
c) be well signposted that it is occuring.

For more information read the bioperl web page, projects, sequence object,

http://bio.perl.org/Projects/Sequence/

INSTALLATION

This module is included with the central Bioperl distribution:

http://bio.perl.org/Core/Latest
ftp://bio.perl.org/pub/DIST

Follow the installation instructions included in the README file.

DESCRIPTION

This module is the generic sequence object which lies at the core of the bioperl project. It stores Dna, Rna, or Protein sequence information and annotation. It has associated methods to perform various manipulations of sequences and support for a reading and writing sequence data in a variety of file formats.

Bio::Seq has completly superceeded Bio::PreSeq.pm.

The older PreSeq.pm code can be found at Chris Dagdigian's site: http://www.sonsorol.org/dag/bioperl/top.html

Sequence Types

Currently the following sequence types are recognized:

Dna
Rna
Amino

Alphabets

This module uses the standard extended single-letter genetic alphabets to represent nucleotide and amino acid sequences.

In addition to the standard alphabet, the following symbols are also acceptable in a biosequence:

?  (a missing nucleotide or amino acid)
-  (gap in sequence)

Extended Dna / Rna alphabet

(includes symbols for nucleotide ambiguity)
------------------------------------------
Symbol       Meaning      Nucleic Acid
------------------------------------------
 A            A           Adenine
 C            C           Cytosine
 G            G           Guanine
 T            T           Thymine
 U            U           Uracil
 M          A or C  
 R          A or G   
 W          A or T    
 S          C or G     
 Y          C or T     
 K          G or T     
 V        A or C or G  
 H        A or C or T  
 D        A or G or T  
 B        C or G or T   
 X      G or A or T or C 
 N      G or A or T or C 


IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE:
  Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.

Amino Acid alphabet

------------------------------------------
Symbol           Meaning   
------------------------------------------
A        Alanine
B        Aspartic Acid, Asparagine
C        Cystine
D        Aspartic Acid
E        Glutamic Acid
F        Phenylalanine
G        Glycine
H        Histidine
I        Isoleucine
K        Lysine
L        Leucine
M        Methionine
N        Asparagine
P        Proline
Q        Glutamine
R        Arginine
S        Serine
T        Threonine
V        Valine
W        Tryptophan
X        Unknown
Y        Tyrosine
Z        Glutamic Acid, Glutamine
*        Terminator


IUPAC-IUP AMINO ACID SYMBOLS:
  Biochem J. 1984 Apr 15; 219(2): 345-373
  Eur J Biochem. 1993 Apr 1; 213(1): 2

Sequence IO Formats

You are encouraged to use the SeqIO system of IO, which in essence looks like:

use Bio::SeqIO;

$instream = Bio::SeqIO->new( -file => 'my.file', -format => 'Fasta' );
$outstream = Bio::SeqIO->new( -fh => \*STDOUT, -format => 'Raw' );

while $seq ( $instream->next_seq ) {
   $outstream->write_seq($seq);
}

The available formats can be found by listing the SeqIO directory
in the distribution that this comes with (as new SeqIO formats are
very easy to add, it is better to go to the directory, not try to list them
here).

Notice that the SeqIO system will only convert information which the Seq object stores. The Seq object is a lightweight object, and does not contain annotation or feature table information. This information is stored in a development object, called AnnSeq, which will be available in the 0.06 releases and later.

USAGE

Using Bio::Seq in your perl programs

Seq.pm is invoked via the perl 'use' command

use Bio::Seq;

Creating a biosequence object

The "constructor" method in Bio::Seq.pm is the new() function.

The proper syntax for accessing the new() function in Seq.pm is as follows:

$myseq = Bio::Seq->new;

Of course, objects are only useful if they have something in them so you would probably want to pass along some additional information or arguments to the constructor. The foundation of any biosequence object is course the sequence itself.

You can address new() with a sequence directly:

$myseq = Bio::Seq->new(-seq=>'AACTGGCGTTCGTG');

Or you can pass in a string or a list:

$myseq = Bio::Seq->new(-seq=>$sequence_string);
$myseq = Bio::Seq->new(-seq=>@sequence_list);

It is also possible to create a new sequence object based on a sequence contained in a file. You can tell constructor where to find the sequence file by passing in the 'file' parameter:

$myseq  = Bio::Seq->new(-file=>'seqfile.gcg');

Because there are so many different conventions or formats for storing sequence information in files, it would be polite (although not absolutely necessary) to tell the constructor what format the sequence file is in. We can provide that information via the file-format or 'ffmt' field. To create a sequence object based upon a GCG-formatted sequence file:

$myseq  = Bio::Seq->new(-file=>'seqfile.gcg',-ffmt=>'GCG');

We've already introduced three different object attributes or arguments that can be passed to the new() object constructor ('seq','file' and 'ffmt') so now would be a good time to introduce them all:

BioSeq Constructor Arguments

file: The "file" argument should be a string value containing path and filename information for a sequence file that is to be read into an object.

seq: The "seq" argument is for passing in sequence directly instead of reading in a sequence file. The sequence should consist of RAW info (no whitespace, newlines or formatting) and can be passed in as either an array/list or string.

id: The "id" argument should be a ONE-WORD string value giving a short name for the sequence.

desc: The "desc" argument should be a string containing a description of the sequence. This field is not limited to one word.

names: The "names" argument should be a hash or reference to a hash that contains any number of user generated key-value pairs. Various bits of identifying information can be stored here including name(s), database locations, accession numbers, URL's, etc.

type: The "type" argument should be a string value describing the sequence type eg; "Dna", "Rna" or "Amino".

origin: The "origin" argument should be a string value describing sequence origin info

start: The start point, in biological coordinates of the sequence

end: The end point, in biological coordinates of the last residue in the sequence

start/end attributes are not strongly tied to what is actually in the sequence (ie, $seq->start()+length($seq->getseq()) doesn't necessarily equal $seq->end()-1 - most of the time it should).

This is to allow some oddities to be stored in the Seq object sensibly.

The numbering convention is 'biological' coordinates. ie the sequence ATG would start at 1 (A) and finish at 3 (G). (NB - this is different from how perl represents ranges in sequences).

numbering() is equivalent to start() (old version). Eventually it will be removed. numbering() accesses the same attribute as start()

numbering: (Deprecated) The "numbering" argument should be an integer value containing the sequence numbering offset value. By default all sequence are numbered starting with 1.

ffmt:

This documentation describes the old format system: you are encouraged to use the newer SeqIO system described separately in the SeqIO documentation.

The "ffmt" argument should be a string describing sequence file-format. If a sequence is being read from a file via the "file" argument, "ffmt" is used to invoke the proper parsing code. "ffmt" is also the default format for sequence output when the layout method is called. See elsewhere in this documentation for info regarding recognized sequence file-formats.

If most of these arguments were used at once to create a sequence object, it would look something like this:

#Set up the name hash
%names = (
'CloneID','DB1',
'Isolate','5',
'Tissue','Xenopus',
'Location','/usr2/users/dag/bioperl/sample.tfa'
);

$name_ref = \%names;

#Create the object
$myseq = new Bio::Seq(-file=>'sample.tfa',
                      -names=>$name_ref,
                      -type=>'Dna',
                      -origin=>'Xenopus mesoderm',
                      -start=>'1',
                      -desc=>'Sample Bio::Seq sequence',
                      -ffmt=>'Fasta');

Methods

Once an object has been created, there are defined ways to go about accessing the information -- users are encouraged to poke around "under the hood" of Seq.pm to see what is going on but it is considered bad form to bypass the defined accession methods and mess around with the internal code. Bypassing the defined methods "voids the warrantee" of the module and can lead to problems down the road. The implied agreement between module creators and users is that the creators will strive to keep the interface standard and backwards-compatible while the users will avoid becoming dependent on bits of internal code that may change or disappear in future revisions.

Detailed information about each method described here can be found in the Appendix.

Accessing information

For each defined way to access information from a biosequence object, there is a corresponding "method" that is invoked. What follows is a brief description of each accessor method. For more detailed information see the individual annotations for each method near the end of this document.

  • Sequence

    The sequence can be accessed in several ways via the getseq() method. Depending on how it is invoked, it can return either a string or a list value.

    Both examples are appropriate:

    @sequence_list   = $myseq->getseq;
    $sequence_string = $myseq->getseq;

    Sequence "slices" can be accessed by passing start and stop integer position arguments to getseq():

    @slice = $myseq->getseq($start,$stop);
    @slice = $myseq->getseq(1,50);
    @slice = $myseq->getseq(100);

    If no stop value is passed in, getseq() will return a slice from the start position to the end of the sequence. Slices are returned in the context of the object "start" attribute, not absolute position so be aware of the objects numbering scheme.

    Sequences can also be accessed in with the ary() and str() methods. The ary() method will always return a list value and str() will always return a string. Otherwise they are functionally identical to the getseq() method.

      $sequence = $myseq->str;
      @sequence = $myseq->ary;
    
      @slice = $myseq->ary($start,$stop);
      $slice = $myseq->str($start,$stop);
  • Sequence length

    The sequence length can be accessed using the seq_len() method

    $len = $myseq->seq_len;
  • Sequence ID

    The ID field can be accessed using the id() method

    $ID = $myseq->id;
  • Description

    The object description field can be accessed using the desc() method

    $description = $myseq->desc;
  • Names

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be accessed by

    %name_hash = $myseq->names;
  • Sequence start

    The biological position of the first residue in the sequence sequence can be accessed via start()

    $start = $myseq->start;
  • Sequence end

    The biological position of the last residue in the sequence sequence can be accessed via end()

    $end = $myseq->end;
  • Sequence Origin

    The object origin (source organism) field can be accessed via origin()

    $seq_origin = $myseq->origin;
  • File input format / default output format

    The object format field can be accessed using the ffmt() method

    $format = $myseq->ffmt;

Changing Information in Sequence Objects

In the previous section it was shown how object attributes and values could be retrieved from a sequence object by calling upon various methods. Many of the above methods will also allow the user to CHANGE object attributes by passing in additional arguments. Detailed information on each method can be found in the Appendix.

  • Changing the sequence

    The sequence information for an object can be changed by passing a string or list value to the setseq() method. Here are some ways that sequence information can be changed

    $myseq->seqseq($new_sequence_string);
    $myseq->setseq(@new_sequence_list);
    $myseq->setseq("aaccttgcctgc");

    The setseq() method checks sequence elements and warns if it finds non-standard characters. Because of this, arbitrary sequence compositions are not supported at this time. This method is considered slightly 'insecure' because the 'id','desc' and 'type' fields are not updated along with the sequence. If necessary, the user must make the appropriate changes to these fields whenever sequence information is updated or changed.

  • Changing the sequence ID

    The ID field can be changed by passing in a new ID argument to id()

    $myseq->id($new_id);
  • Changing the object description

    The object description field can be changed by passing in a new argument to desc()

    $myseq->desc($new_desc);
  • Changing the object names hash

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be changed by passing in a reference to a new hash to names()

    $hash_ref = \%name_hash;
    $myseq->names($hash_ref);
  • Changing the sequence start or end

    The default numbering offset for the sequence can be changed by passing in a new value to start() or end()

    $myseq->start(1);
    $myseq->start($new_value);
  • Sequence Origin

    The object origin field can be changed by passing in a new string value to origin()

    $myseq->origin("mitochondrial");
    $myseq->origin($origin_string);
  • File input format / default output format

    The object format field can be accessed by passing in a new value to ffmt()

    $myseq->ffmt("GCG"); 

Manipulating sequences

Creating, accessing and changing biosequence objects and fields is all well and good, but eventually you are going to want to actually do some work.

Included with Seq.pm are some commonly used utility methods for manipulating sequence data. So far Seq.pm contains methods for:

  • Copying a biosequence object

    using copy()

    # NB - new_obj is a Bio::Seq object
    
    $new_obj = $myseq->copy;
  • Reversing a sequence

    using reverse()

    $reversed_seq = $myseq->reverse;
  • Complementing a sequence

    The 2nd strand, or "complement" of a biosequence can be obtained by calling upon the complement() method.

    $comp_seq = $myseq->complement;
  • Reverse complementing a sequence

    using revcom()

       # NB - rev_comp is a Bio::Seq object
    
       $rev_comp = $myseq->revcom;
  • Translating Dna to Rna

    using Dna_to_Rna()

    $rna_seq = $myseq->Dna_to_Rna;
  • Translating Rna to Dna

    using Rna_to_Dna()

    $dna_seq = $myseq->Rna_to_Dna;
  • Translating Dna or Rna to protein

    using translate()

    # NB - peptide_seq is a Bio::Seq object
    
    $peptide_seq = $myseq->translate;
  • Checking the sequence alphabet

    To check if any nonstandard characters are present in a biosequence, an alphabet_ok() method is provided. The method returns "1" if everything is OK, otherwise it returns a "0".

    if($myseq->alphabet_ok) { print "OK!!\n"; }
     else { print "Not OK! \n"; }

    To get alphabet itself, use the alphabet() method, which will return a string containing all characters in the current alphabet.

    $alph = $myseq->alphabet;

    To use restrictive alphabets that do not permit ambiguity codes, include '-strict => 1' in the parameters sent to new(). Or, for any existing sequence object, try:

    $myseq->strict(1); 
    $myseq->alphabet_ok() or die "alphabet not okay.\n";

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

bioperl-bugs@bio.perl.org                   
http://bio.perl.org/bioperl-bugs/           

ACKNOWLEDGEMENTS

Some pieces of the code were contributed by Steven E. Brenner, Steve Chervitz, Ewan Birney, Tim Dudgeon, David Curiel, and other Bioperlers. Thanks !!!!

REFERENCES

BioPerl Project Page http://bio.perl.org/

VERSION

Bio::Seq.pm, beta 0.051

COPYRIGHT

Copyright (c) 1996-1998 Chris Dagdigian, Georg Fuellen, Richard
Resnick, and others All Rights Reserved. This module is free
software; you can redistribute it and/or modify it under the same
terms as Perl itself.

Appendix

The following documentation describes the various functions contained in this module. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_").

new

Title     : new
Usage     : $mySeq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                        $start,$end,$type,$ffmt,$descffmt);
          :                - or -
          : $mySeq = Bio::Seq->new(-file=>$file,
                                  -seq=>$seq,
                                  -id=>$id,
                                  -desc=>$desc,
                                  -names=>$names,
                                  -start=>$start,
                                  -end=>$end,
                                  -type=>$type,
                                  -origin=>$origin,
                                  -ffmt=>$ffmt,
                                  -descffmt=>$descffmt);
Function  : The constructor for this class, returns a new object.
Example   : See usage
Returns   : Bio::Seq object
Argument  : $file: file from which the sequence data can be read; all
              the other arguments will overwrite the data read in.
              "_nofile" is recommanded if no file is given.
            $seq: String or array of characters
            $id: String describing the ID the user wishes to assign.
            $desc: String giving a description of the sequence
            $names: A reference to a hash which stores {loc,name}
                    pairs of other database locations and corresponding names
                    where the sequence is located.
            $start: The offset of the sequence, as an integer
            $end: The end point of the sequence, as an integer
            $type: The type of the sequence, see type()
            $origin: The sequence origin
            $ffmt: Sequence format, see ffmt()
            $descffmt: format of $desc, see descffmt()
   

## Internal methods ##

_initialize

Title     : _initialize
Usage     : n/a (internal function)
Function  : Assigns initial parameters to a blessed object.
Example   : 
Returns   : 
Argument  : As Bio::Seq->new, allows for named or listed parameters.
            See ->new for the legal types of these values.

_seq

Title     : _seq()
Usage     : n/a, internal function
Function  : called by new() to set sequence field. Checks
          : alphabet before setting.
          :
Returns   : n/a
Argument  : sequence string

_monomer

Title     : _monomer()
Usage     : n/a, internal function
Function  : Returns the internal monomer that represents
          : sequence type.
          :
          : Sequence type is treated internally as a monomer
          : defined by the %SeqAlph hash. The type field
          : is a list of format [monomer,origin]. For any
          : output outside the module, the monomer is resolved
          : back into string form via the %TypeSeq hash.
          :
Returns   : original type setting [as monomer]
Argument  : none

_file_read

Title     : _file_read()
Usage     : n/a (Internal Function)
Function  : _file_read is called whenever the constructor is called 
          : with the name of a sequence to be read from disk.
          :
          : This function is now DEPRECATED. you should use the SeqIO
          : system
          :
Example   : n/a, only called upon by _initialize()
Returns   : 
Argument  : 

## ACCESSORS ##

seq_len

Title       : seq_len()
Usage       : $len = $myseq->seq_len;
Function    : Returns a value representing the sequence
            : length
            :
Example     : see above
Arguments   : none
Returns     : integer

ary

Title     : ary
Usage     : ary([$start,[$end]])
Function  : Returns the sequence of the object as an array, or a substring
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the substring is from $start to the
            end of the sequence.
Example   : @slice = $myObject->ary(3,9);
Returns   : array of characters
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

str

Title     : str
Usage     : str([$start,[$end]])
Function  : Returns the sequence of the object as a string, or a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : $slice = $myObject->str(3,9);
Returns   : string scalar
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

seq

Title     : seq
Usage     : seq([$start,[$end]])
Function  : Returns the sequence of the object as an array or a char
            string, depending on the value of wantarray. Will rtn a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : @slice = $myObject->seq(3,9);
Returns   : regular array of characters, or a scalar string
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})
Comments  : 

getseq

Title     : getseq
Usage     : getseq([$start,[$end]])
Function  : Returns the sequence of the object as an array or a char
            string, depending on the value of wantarray. Will rtn a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : @slice = $myObject->seq(3,9);
Returns   : regular array of characters, or a scalar string
Throws    : Warning about deprecated method.
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

id

Title     : id()
Usage     : $seq_id = $myseq->id; 
          : $myseq->id($id_string);
          :
Function  : Sets field if an ID argument string is
          : passed in. If no arguments, returns ID value for
          : object.
          :
Returns   : original ID value
Argument  : sequence string

desc

Title     : desc()
Usage     : $description = $myseq->desc; 
          : $myseq->desc($desc_string);
          :
Function  : Sets field if an argument string is
          : passed in. If no arguments, returns original value for
          : object description field.
          :
Returns   : original value for description
Argument  : sequence string

names

Title     : names()
Usage     : %names = $myseq->names; 
          : $myseq->names($hash_ref);
          :
Function  : Sets field if a name hash refrence is
          : passed in. If no arguments, returns original 
          : names hash.
          :
Returns   : hash refrence (associative array)
Argument  : refrence to a hash (associative array)

numbering

Title     : numbering()
Usage     : $num_start = $myseq->start; 
          : $myseq->start($value);
          :
Function  : Sets field if an argument is
          : passed in. If no arguments, returns original value.
          :
          : (Deprecated - should switch to start())
Returns   : original value 
Argument  : new value

start

Title     : start
Usage     : $start = $myseq->start(); #get
          : $myseq->start($value); #set
Function  : the set/get for the start position
Example   :
Returns   : start value 
Arguments : new value

end

Title     : end
Usage     : $end = $myseq->end(); #get
          : $myseq->end($value); #set
Function  : The set/get for the end position
Example   :
Returns   : end value 
Arguments : new value

get_nse

Title    : get_nse
Usage    : $tag = $myseq->get_nse() #
Function : gets a string like "name/start-end". This is likely
         : to be unique in an alignment/database
         : Used alot by SimpleAlign
Example  :
Returns  : A string
Arguments: Two optional arguments - first being the name/ separator, second the
           start-end separator

origin

Title     : origin()
Usage     : myseq->origin($value) 
Function  : Sets the origin field which is actually the second
          : field of the Type list. The {type} field is a 2 value list
          : with a format of ["Monomer","Origin"]
          :
Returns   : Original value
Argument  : string
Comments  : SAC: Consider renaming this method to "organism()" or "species()". 
          : "origin" is ambiguous and can be easily confused with 
          : a coordinate data (0,0).

type

Title     : type()
Usage     : myseq->type($value) 
Function  : Sets the type field which is the first
          : field of the Type list. The {type} field is a 2 value list
          : with a format of ["Monomer","Origin"]
          :
Returns   : String containing one of the recognized sequence types:
          : 'unknown', 'dna', 'rna', 'amino', 'otherseq', 'aligned'
          : See the %Seq::SeqAlph hash for the current types.
Argument  : string containing a valid sequence type
          : SAC: case of user-supplied argument does not matter

ffmt

Title     : ffmt()
Usage     : $format = $myseq->ffmt;
          : $myseq->ffmt("Fasta");
          : 
Function  : The file format field is used by the internal
          : sequence parsing code when trying to read 
          : in a sequence file. It is also what is used
          : as a default output format if the layout
          : method is called without an argument.
          :
          : If a sequence object is created without
          : reading in a file, or if the file is read
          : in with the use of the ReadSeq package then
          : the ffmt field can be set to indicate any default
          : output-format preference.
          :
          : If a sequence is read from a file and parsed
          : by internal code (ReadSeq not used) then the ffmt
          : field should describe the format of the sequence
          : file. The ffmt field is used to send the sequence
          : to the correct internal parsing code.
          :
Returns   : original ffmt value
Argument  : recognized ffmt string value (see list of recognized 
          : formats) # SAC: What are they?! This list should be obvious.
          : Valid strings: 
          :    RAW, FASTA, GCG, IG, GENBANK, NBRF, EMBL, 
          :    MSF, PIR, GCG_SEQ, GCG_REF, STRIDER, ZUKER,
          : SAC: case of user-supplied argument does not matter

descffmt

Title     : descffmt()
Usage     : $desc = $myseq->descffmt;
          : $myseq->descffmt($new_value); 
Function  : 
          :
Returns   : original value
Argument  : $new_value (one of the formats as defined in $SeqForm).
          : SAC: case of $new_value argument does not matter.

setseq

Title     : setseq()
Usage     : $self->setseq($new_sequence);
Function  : Changes the sequence inside a bioseq object
          :
Returns   : sequence string 
Argument  : sequence string

parse

Title     : parse
Usage     : parse($ent,[$ffmt]);
Function  : Invokes the proper parsing code depending on
          : the value of the object 'ffmt' field.
Example   : $self->parse;
Returns   : n/a
Argument  : the prospective sequence to be parsed, 
          : and optionally its format so that it doesn't need to
          : be estimated
          : SAC: case of $ffmt argument does not matter.

parse_raw

Title     : parse_raw
Usage     : parse_raw;
Function  : parses $ent into the $self->{"seq"} field, using Raw
          : file format.
Example   : $self->parse_raw;
Returns   : n/a
Argument  : n/a

parse_genbank

Title    : parse_genbank

= cut

sub parse_genbank { my ($self) = shift; my ($ent) = @_; my $seqstart = false; my $defstart = false;

 my @lines = split("\n", $ent);
 for ( @lines ) {
   chomp;
   
   m/LOCUS\s*(\S+)/ and $self->{"id"} = $1;
   
   m/DEFINITION\s*(.+)/ and do { $self->{"desc"} = $1; $defstart = true; };
   $defstart and do {
     m/^ {11}( .+)/ or $defstart = false;
     $defstart and $self->{"desc"} .= $1; };
   
   m/ORIGIN/ and do { $seqstart = true; next; };
   m!//! and $seqstart = false;
   $seqstart and do { s/[\s|\d]//g; $self->{"seq"} .= $_; };
 }

 return 1;
}

#_______________________________________________________________________

parse_fasta

Title     : parse_fasta
Usage     : parse_fasta;
Function  : parses $ent into the "seq" field, using Fasta
          : file format.
          :
To-do     : use benchmark module to find best/fastest parse
          : method
          :
Example   : $self->parse_fasta;
Returns   : n/a
Argument  : n/a

parse_gcg

Title    : parse_gcg
Usage    : used by internal code
Function : Parses the sequence out of a gcg-format string and
         : sets the object sequence field accordingly. This is
         : a simple, ineffecient method for grabbing JUST the
         : sequence.
         :
To-do    : - parse out more info than just sequence 
         : - implement alphabet checking
         : - better regular expressions/efficiency
         : - carp on unexpected / wrong-format situations
         :
Version  : .01 / 16 Jan 1997 
Returns  : 1
Argument : gcg-formatted sequence string

## METHODS FOR FILE FORMAT AND OUTPUT ##

#_______________________________________________________________________

layout

 Title    : layout()
Usage     : layout([$format]);
Function  : Returns the sequence in whichever format the user specifies,
            or in the "ffmt" field if the user does not specify a format.
Example   : $fastaFormattedSeq = $myObj->layout("Fasta");
Returns   : varies
Argument  : $format (one of the formats as defined in $SeqForm).
          : SAC: case of $ffmt argument does not matter.

out_raw

Title     : out_raw
Usage     : out_raw;
Function  : Returns the sequence in Raw format.
Example   : $self->out_raw;
Returns   : string sequence, in raw format
Argument  : n/a

out_fasta

Title     : out_fasta
Usage     : out_fasta;
Function  : Returns the sequence as a string in FASTA format.
Example   : $self->out_fasta;
          :
To-do     : benchmark code / find fastest method
          :
Returns   : string sequence in Fasta format
Argument  : n/a

alphabet_ok

Title     : alphabet_ok
Usage     : $myseq->alphabet_ok;
Function  : Checks the sequence for presence of any characters
          : that are not considered valid members of the genetic
          : alphabet. In addition to the standard genetic alphabet
          : (see documentation), "?" and "-" characters are
          :  considered valid.
          :
Example   : if($myseq->alphabet_ok) { print "OK!!\n"; }
          :     else { print "Not OK! \n"; }
          :
Note      : Does not handle '\' characters in sequence robustly
          :
Returns   : 1 if OK / 0 if not OK
Argument  : none

alphabet

Title     : alphabet
Usage     : $myseq->alphabet;
Function  : Returns the characters in the alphabet in use for the sequence.
Example   : print "Alphabet: ".$myseq->alphabet;
Returns   : string containing alphabet characters
Argument  : none

GCG_checksum

Title     : GCG_checksum
Usage     : $myseq->GCG_checksum;
Function  : returns a gcg checksum for the sequence
Example   : 
Returns   : 
Argument  : none

trunc

Title     : trunc
Usage     : $trunc_seq = $mySeq->trunc(12,20);
Function  : Returns a truncated part of the sequence, truncation
            happening by the ->str() call. This is just a convience call
            therefore for this object

Returns   : Bio::Seq object ref.
Argument  : start point, end point in biological coordinates

copy

Title     : copy
Usage     : $copyOfObj = $mySeq->copy;
Function  : Returns an identical copy of the object.
Example   :
Returns   : Bio::Seq object ref.
Argument  : n/a

revcom

Title       : revcom
Usage       : $reverse_complemented_seq = $mySeq->revcom;
Function    : Returns a Bio::Seq object with the reverse
            : complement of a nucleotide object sequence
Example     : $reverse_complemented_seq = $mySeq->revcom;
Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
            : library of molbio perl routines
Note        :
            : The letter codes and compliment translations
            : are those proposed by IUB (Nomenclature Committee,
            : 1985, Eur. J. Biochem. 150; 1-5) and are also
            : used by the GCG package. The IUB/GCG letter codes
            : for nucleotide ambiguity are compatible with
            : EMBL, GenBank and PIR database formats but are
            : *NOT* compatible with Stadem/Sanger ambiguity
            : symbols. Staden/Sanger use different symbols to
            : represent uncertainty and frame abiguity.
            :
            : Currently Staden/Sanger are not recognized
            : sequence types.
            :
            : GCG Documentation on sequence symbols:
URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices/appendix_iii.html
            :
Translation :
            : GCG/IUB    Meaning        Complement
            : ------------------------------------
            :  A            A                T
            :  C            C                G
            :  G            G                C
            :  T            T                A
            :  U            U                A
            :  M          A or C             K
            :  R          A or G             Y
            :  W          A or T             W
            :  S          C or G             S
            :  Y          C or T             R
            :  K          G or T             M
            :  V        A or C or G          B
            :  H        A or C or T          D
            :  D        A or G or T          H
            :  B        C or G or T          V
            :  X      G or A or T or C       X
            :  N      G or A or T or C       N
            :--------------------------------------
Revision    : 0.01 / 3 Jun 1997
Returns     : A new sequence object
              to get the actual sequence go
              $actual_reversed_sequence = $seq->revcom()->str()
Argument    : n/a

complement

Title       : complement
Usage       : $complemented_seq = $mySeq->compliment;
Function    : Returns a char string containing 
            : the complementary sequence (eg; other strand)
            : of the original sequence. The translation method
            : is identical to revcom() but the nucleotide order
            : is not reversed. 
            :
            : To be honest *most* of the time you will want
            : to use revcom not this. Be careful!
            :
Example     :  $complemented_seq = $mySeq->complement;
            :
Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
            : library of molbio perl routines
Note        :
            : The letter codes and complement translations
            : are those proposed by IUB (Nomenclature Committee,
            : 1985, Eur. J. Biochem. 150; 1-5) and are also
            : used by the GCG package. The IUB/GCG letter codes
            : for nucleotide ambiguity are compatible with
            : EMBL, GenBank and PIR database formats but are
            : *NOT* compatible with Stadem/Sanger ambiguity
            : symbols. Staden/Sanger use different symbols to
            : represent uncertainty and frame abiguity.
            :
            : Currently Staden/Sanger are not recognized
            : sequence types.
            :
            : GCG Documentation on sequence symbols:
URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices
            : /appendix_iii.html
            :
Translation :
            : GCG/IUB    Meaning        Complement
            : ------------------------------------
            :  A            A                T
            :  C            C                G
            :  G            G                C
            :  T            T                A
            :  U            U                A
            :  M          A or C             K
            :  R          A or G             Y
            :  W          A or T             W
            :  S          C or G             S
            :  Y          C or T             R
            :  K          G or T             M
            :  V        A or C or G          B
            :  H        A or C or T          D
            :  D        A or G or T          H
            :  B        C or G or T          V
            :  X      G or A or T or C       X
            :  N      G or A or T or C       N
            :--------------------------------------
            :
Revision    : 0.01 / 6 Dec 1996
Returns     : char string
Argument    : n/a

#_______________________________________________________________________'

reverse

Title     : reverse
Usage     : $reversed_seq = $mySeq->reverse;
Function  : Returns a char string containing the
          : reverse of the object sequence
          :
          : Does *NOT* complement it. If you want
          : the other strand, use $mySeq->revcom()
          : 
Example   :  $reversed_seq = $mySeq->reverse;
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : char string
Argument  : n/a

Dna_to_Rna

Title     : Dna_to_Rna
Usage     : $translated_seq = $mySeq->Dna_to_Rna;
Function  : Returns a char string containing the
          : Rna translation of the Dna nucleotide sequence
          : (Replaces T with U)
          : 
Example   : $translated_seq = $mySeq->Dna_to_Rna;
          :
Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
          : library of molbio perl routines
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : char string
Argument  : n/a

Rna_to_Dna

Title     : Rna_to_Dna
Usage     : $translated_seq = $mySeq->Rna_to_Dna;
Function  : Returns a char string containing the
          : Dna translation of the Rna nucleotide sequence
          : (Replaces U with T)
          : 
Example   : $translated_seq = $mySeq->Rna_to_Dna;
          :
Revision  : 0.01 / 16 MAR 1997
Returns   : char string
Argument  : n/a

translate

Title     : translate
Usage     : 
Function  : Returns a new Bio::Seq object with the protein
          : translation from this sequence
          :
          : "*" is the default symbol for a stop codon
          : "X" is the default symbol for an unknown codon
          :
Example   : $translation = $mySeq->translate;
          :   -or- with user defined stop/unknown codon symbols:
          : $translation = $mySeq->translate($stop_symbol,$unknown_symbol);
          : 
Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
          : library of molbio perl routines
          :
To-do     : - allow named parameters (just like new and out_GCG )
          : - allow "frame" parameter to pick translation frame
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : new Sequence object. Its id is the original id.trans
Argument  : n/a

dump

Title     : dump
Usage     : @results = $mySeq->dump; -or- 
          : $results = $mySeq->dump;
          :
Function  : Returns a formatted array or string (depending on how it
          : is invoked) containing the contents of a 
          : Bio::Seq object. Useful for debugging
          :
          : ***This is used by Chris Dagdigian for debugging ***
          : ***Probably should be removed before distribution***
          :
Example   :  @results = $mySeq->dump;
          :  foreach(@results){print;}
          :     -or-
          :  print $myseq->dump;
          :
Returns   : Array or string depending on value of wantarray
Argument  : n/a

out_bad

Title     : out_bad()
Usage     : out_bad;
Function  : Throws a fatal error if we don't know the output format.
Example   : $self->out_bad;
Returns   : n/a
Argument  : n/a

out_primer

Title     : out_primer()
Usage     : $formatted_seq = $myseq->out_primer;
          : @formatted_seq = $myseq->out_primer;
          :
          : print $myseq->out_primer(-id=>'New ID',
          :                          -header=>'This is my header');
          :
Function  : outputs a sequence in primer format
          :
Note      : Not a supported output type -  (cant be invoked via layout)
          : Use at your own risk :)
          : 
Example   : see usage
          :
Revision  : 0.01 / 20 Dec 1996
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters for "id" and "header" are alowed

out_pir

Title     : out_pir()
Usage     : $formatted_seq = $myseq->layout("PIR");
          : $formatted_seq = $myseq->out_pir;
          : @formatted_seq = $myseq->out_pir;
          :
          : print $myseq->out_pir(-title=>'New TITLE',
          :                       -entry=>'New ENTRY',
          :                       -acc=>'User defined accession',
          :                       -date=>'User defined date',
          :                       -reference=>'User defined ref info');
          :
Function  : Returns a string or an array depending on how it
          : is invoked. Can be easily accessed via the layout()
          : method, or if more output control is desired it can
          : be called directly with the folowing named parameters:
          :
          :  -entry      PIR entry
          :  -title      PIR title
          :  -acc        user defined accession number
          :  -reference  user defined reference
          :  -date       user defined date/time info
          :
          : All named parameters will take precedance over any
          : default behavior. When there are no user arguments,
          : the default output is as follows:
          :
          : PIR 'ENTRY'     = sequence object "id" field
          : PIR 'TITLE'     = sequence object "desc" field
          : PIR 'DATE'      = curent date/time
          : PIR 'ACC'       = not used in default output
          : PIR 'REFERENCE' = not used in default output
          :
Note      : Not tested stringently.
          :
WARNING   : Does not deal with numbering issue
          :
To-do     : - Allow user to pass in hash of additional fields/values
          : - Deal with numbering issue
          :
Example   : see usage
          :
Revision  : 0.02 / 12 Jan 1997
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters are allowed, see above

out_genbank

Title     : out_genbank()
Usage     : $formatted_seq = $myseq->out_genbank;
          : @formatted_seq = $myseq->out_genbank;
          : print $myseq->out_genbank(-id=>'New ID',
          :                           -def=>'User defined definition',
          :                           -acc=>'User defined accession',
          :                           -origin=>'User defined origin info',
          :                           -spacing=>'single',
          :                           -caps=>'up',
          :                           -date=>'DATE GOES HERE',
          :                           -type=>'mRna');
          :   
Function  : Returns a GenBank formatted sequence array or string
          : depending on the value of wantarray when invoked via layout(). 
          : If more control is desired over output format, out_genbank() 
          : can be addressed directly with the following named parameters:
          :
          : def          - Sequence definition information
          : acc          - Sequence accession number
          : origin       - Sequence origin information
          : id           - short name 
          : date         - new date info
          : type         - sequence type (Dna, mRna, Amino, etc.)
          : spacing      - "single" or "double" sequence line spacing
          : caps         - "up" or "down" sequence capitalization
          :
          : When invoked via layout() or called directly with no 
          : arguments, the following default behaviours apply:
          :  DATE = Current date and time
          :  DEFINITION = object's description field
          :  ID = object's ID field
          :  SPACING = single
          :
          : All named parameters must be strings. Passed in parameters will
          : always take precedence over any fields with default settings.
          :
Note      : Format not stringently tested for accuracy. Sequence is numbered
          : according to the integer specified in the object 'start' field
          : but the implementation has not been robustly tested.
          :
To-do     : - allow user hash reference for additional format fields
          :
Example   : see usage
          :
Revision  : 0.02 / 12 Jan 1997
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters are allowed, see above

out_GCG

Title    : out_GCG
Usage    : $formatted_seq = $mySeq->layout("GCG"); 
         : @formatted_seq = $mySeq->layout("GCG");
         : 
         : print $myseq->out_GCG(-id=>'New ID',
         :                      -spacing=>'single',
         :                      -caps=>'up',
         :                      -date=>'DATE GOES HERE',
         :                      -header=>'This is a user submitted header',
         :                      -type=>'n');
         :   
Function : Returns a GCG formatted sequence array or string
         : depending on the value of wantarray when invoked via layout(). 
         : If more control is desired over output format, out_GCG() 
         : can be addressed directly with the following named parameters:
         :
         : header       - first line(s) of formatted sequence
         : id           - short name that appears before 'Length:' field
         : date         - overwrite default date info
         : type         - can be "N" or "P", for nucleotide/protein
         : spacing      - "single" or "double" sequence line spacing
         : caps         - "up" or "down" sequence capitalization
         :
         : When invoked via layout() or called directly with no 
         : arguments, the following default behaviours apply:
         :  DATE = Current date and time
         :  DEFINITION = object's description field
         :  ID = object's ID field
         :  SPACING = single
         :         
         : All named parameters must be strings. Passed in parameters will
         : always take precedence over any fields with default settings.
         :
Example  :  
Output   :
         :Sample Bio::Seq sequence
         : sample Length: 240  Wed Nov 27 13:24:28 EST 1996  Type: N Check: 5371  ..
         :
         :       1  aaaacctatg gggtgggctc tcaagctgag accctgtgtg cacagccctc
         :      51  tggctggtgg cagtggagac gggatnnnat gacaagcctg ggggacatga
         :     101  ccccagagaa ggaacgggaa caggatgagt gagaggaggt tctaaattat
         :     151  ccattagcac aggctgccag tggtccttgc ataaatgtat agagcacaca
         :     201  ggtgggggga aagggagaga gagaagaagc cagggtataa
         :
         :
Note     : GCG formatted sequences contain a "Type:" field.
         : If Type cannot be internally determined and no
         : Type name-parameter is passed in then the Type: 
         : field is not printed.
         :
Warning  : Unconventional numbering offsets may not
         : be robustly handled
         :
Revision : 0.06 / 12 Jan 1997
Source   : Found guts of this code on bionet.gcg, unknown author
Returns  : Array or String
Argument : n/a

out_nbrf

Title     : out_nbrf()
Usage     : $self->layout("NBRF") or $self->out_nbrf
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_gcgseq

Title     : out_gcgseq
Usage     : out_gcgseq;
Function  : Returns the sequence as a string in GCG_SEQ format.
Example   : $self->out_gcgseq;
          :
Returns   : string sequence in GCG_SEQ format
Argument  : n/a
Comments  : SAC: Derived from out_fasta().
          : GCG_SEQ is a format that looks alot like Fasta and is used
          : for building GCG sequence datasets (.seq files).
          : It also has some similarities to NBRF format.

out_gcgref

Title     : out_gcgref
Usage     : out_gcgref;
Function  : Returns the sequence as a string in GCG_REF format.
Example   : $self->out_gcgref;
          :
Returns   : string sequence in GCG_REF format
Argument  : n/a
Comments  : SAC: Derived from out_gcgseq().
          : GCG_REF is a companion format for GCG_SEQ that is used
          : for building GCG sequence datasets (.ref files).
          : The .ref file is identical to .seq file but without the sequence.

out_ig

Title     : out_ig()
Usage     : $self->layout("IG") or $self->out_ig
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_strider

Title     : out_strider()
Usage     : $self->layout("Strider") or $self->out_strider
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_zuker

Title     : out_zuker()
Usage     : $self->layout("Zuker") or $self->out_zuker
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_msf

Title     : out_msf()
Usage     : $self->layout("MSF") or $self->out_msf
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

parse_unknown

Title     : parse_unknown
Usage     : parse_unknown($ent);
Function  : tries to figure out the format of $ent and then
          : calls the appropriate function to parse it into $self->{"seq"}.
Example   : $self->parse_unknown;
Returns   : n/a
Argument  : $ent : the rough multi-line string to be parsed

parse_bad

Title     : parse_bad
Usage     : parse_bad;
Function  : complains of un-parsable sequence, last-ditch attempt via
          : Parse.pm if sequence is being read from a file.
          :
Example   : $self->parse_bad;
Returns   : n/a
Argument  : n/a

version

Title     : version();
Usage     : $myseq->version;
Function  : prints Bio::Seq current version number

Bio::Seq Guts

Sequence Object

The sequence object is merely a reference to a hash containing
all or some of the following fields...

Field         Value
--------------------------------------------------------------
seq           the sequence

id            a short identifier for the sequence

desc          a description of the sequence, in descffmt file-format

names         a hash of identifiers that relate to the sequence..
              these could be Database ID's, Accession #'s, URL's,
              pathnames, etc. Currently there is no set format
              for the names hash and no formal definition of databases 
              or names

start         start in bio-coords of the first residue of the sequence

end           end in bio-coords of the first residue of the sequence

type          the sequence type. Is actually a 2 value list of format
              ["monomer","origin"] where monomer is one of the
              recognized sequence types and origin is a string
              description of the sequences' origin (mitochondrial, etc)

ffmt          file-format for the sequence

descffmt      file-format of the description string