NAME

Bio::Seq - bioperl sequence object

SYNOPSIS

Object Creation

 $seq = Bio::Seq->new;
 
 $seq = Bio::Seq->new($filename);
 
 $seq = Bio::Seq->new(-seq=>'ACTGTGGCGTCAACTG');
 
 $seq = Bio::Seq->new(-seq=>$sequence_string);
 
 $seq = Bio::Seq->new(-seq=>@character_list);
 
 $seq = Bio::Seq->new(-file=>'seqfile.aa',
		      -desc=>'Sample Bio::Seq sequence',
		      -start=>'1',
		      -type=>'Amino',
		      -ffmt=>'Fasta');
 
 $seq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                     $numbering,$type,$ffmt,$descffmt);

Object Manipulation

$seq->[METHOD];

$result = $seq->[METHOD];



Accessors
--------------------------------------------------------
There are a wide variety of methods designed to give easy
and flexible access to the contents of sequence objects

The following accessors can be invoked upon a sequence object

ary()        - access sequence (or slice of sequence) as an array
str()        - access sequence (or slice of sequence) as a string
getseq()     - access sequence (or slice) as string or array
seq_len()    - access sequence length
id()         - access/change object id 
desc()       - access/change object description
names()      - access/change object names
start()      - access/change start point of the sequence (see note below) 
end()        - access/change end point of the sequence (see note below)
numbering()  - access/change sequence numbering offset (deprecated)
origin()     - access/change sequence origin
type()       - access/change sequence type
ffmt()       - access/change default output format
descffmt()   - access/change description format
setseq()     - change sequence


Methods
--------------------------------------------------------
The following methods can be invoked upon a sequence object

copy()        - returns an exact copy of an object
alphabet_ok() - check sequence against genetic alphabet  
alphabet()    - returns the genetic alphabet currently in use
layout()      - sequence formatter for output
revcom()      - reverse complement of sequence
complement()  - complement of sequence  
reverse()     - reverse of sequence
Dna_to_Rna()  - translate Dna seq to Rna
Rna_to_Dna()  - translate Rna seq to Dna
translate()   - protein translation of Dna/Rna sequence

INSTALLATION

This module is included with the central Bioperl distribution:

http://bio.perl.org/Core/Latest
ftp://bio.perl.org/pub/DIST

Follow the installation instructions included in the README file.

DESCRIPTION

This module is the generic sequence object which lies at the core of the bioperl project. It stores Dna, Rna, or Protein sequence information and annotation. It has associated methods to perform various manipulations of sequences and support for a reading and writing sequence data in a variety of file formats.

Bio::Seq has completly superceeded Bio::PreSeq.pm.

The older PreSeq.pm code can be found at Chris Dagdigian's site: http://www.sonsorol.org/dag/bioperl/top.html

  • BASED ON PreSeq.pm, THIS VERSION OF Seq.pm HAS BEEN INTEGRATED INTO THE BIOPERL FRAMEWORK.

    For a complete description of these changes, see the comments at the top of the source.

Sequence Types

Currently the following sequence types are recognized:

Dna
Rna
Amino

Alphabets

This module uses the standard extended single-letter genetic alphabets to represent nucleotide and amino acid sequences.

In addition to the standard alphabet, the following symbols are also acceptable in a biosequence:

?  (a missing nucleotide or amino acid)
-  (gap in sequence)

Extended Dna / Rna alphabet

(includes symbols for nucleotide ambiguity)
------------------------------------------
Symbol       Meaning      Nucleic Acid
------------------------------------------
 A            A           Adenine
 C            C           Cytosine
 G            G           Guanine
 T            T           Thymine
 U            U           Uracil
 M          A or C  
 R          A or G   
 W          A or T    
 S          C or G     
 Y          C or T     
 K          G or T     
 V        A or C or G  
 H        A or C or T  
 D        A or G or T  
 B        C or G or T   
 X      G or A or T or C 
 N      G or A or T or C 


IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE:
  Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.

Amino Acid alphabet

------------------------------------------
Symbol           Meaning   
------------------------------------------
A        Alanine
B        Aspartic Acid, Asparagine
C        Cystine
D        Aspartic Acid
E        Glutamic Acid
F        Phenylalanine
G        Glycine
H        Histidine
I        Isoleucine
K        Lysine
L        Leucine
M        Methionine
N        Asparagine
P        Proline
Q        Glutamine
R        Arginine
S        Serine
T        Threonine
V        Valine
W        Tryptophan
X        Unknown
Y        Tyrosine
Z        Glutamic Acid, Glutamine
*        Terminator


IUPAC-IUP AMINO ACID SYMBOLS:
  Biochem J. 1984 Apr 15; 219(2): 345-373
  Eur J Biochem. 1993 Apr 1; 213(1): 2

Output Formats

The following output formats are currently supported: Raw, Fasta, GCG, GenBank, PIR

Input Formats

In addition to "raw" sequence files, Seq.pm is currently only able to read in Fasta and GCG formatted single sequence files. Support for additional formats is forthcoming.

Seq.pm has the ability to make use of D.G. Gilbert's ReadSeq program when reading in sequence files. ReadSeq has the ability to read and interconvert between many different biological sequence formats.

When readseq is present and Seq.pm has been properly configured to use it, ReadSeq will be invoked when internal parsing code fails to recognize the sequence.

Formats which readseq currently understands:

- IG/Stanford
- GenBank/GB
- NBRF
- EMBL
- GCG
- DnaStrider
- Fitch format
- Pearson/Fasta
- Zuker format
- Olsen format
- Phylip3.2
- Phylip
- Plain/Raw
* MSF
* PAUP's multiple sequence (NEXUS) format
* PIR/CODATA format used by PIR
* ASN.1 format used by NCBI

Note: Formats indicated with a '*' allow for multiple
      sequences to be contained within one file. At this
      time, the behaviour of Seq.pm with regard to these
      multiple-sequence files has not been specified.

Readseq is freely distributed and is available in shell archive (.shar) form via FTP from ftp.bio.indiana.edu (129.79.224.25) in the molbio/readseq directory. (URL) ftp://ftp.bio.indiana.edu/molbio/readseq/

If ReadSeq is not available or Seq.pm is not configured to use it, internal parsing mechanisms will be used.

Currently supported filetypes for input: Raw, Fasta, GCG

USAGE

Installation

Seq.pm requires the use of other bioperl modules, particularly the Bio::Root framework. This module should be installed along with the rest of the bioperl code.

Why modules and object-oriented code?

Perl5 is nice in that it allows users to use OO-style programming only in the situations where they feel like doing so.

  • Simple interfaces to complex tasks.

    From the perspective of novice or occasional perl users, objects are useful because they can offer direct and simple ways to do things that in reality may be somewhat complex or arcane. Users interact with and manipulate objects via specific, documented methods and never have to worry about what is going on "behind the scenes." Many perl programmers have devoted significant amounts of time and effort creating easy-to-use "wrappers" around complex or abstract tasks. Visit the CPAN Module list at (URL) http://www.perl.com/perl/CPAN/CPAN.html to see the fruits of their labor.

  • Reusability.

    From the prospective of a perl power-user, object-oriented programming allows programmers to write code that is easily scalable and reusable. This allows powerful applications to be built rapidly with and with a minimum of waste or repeated effort.

Using Bio::Seq in your perl programs

Seq.pm is invoked via the perl 'use' command

use Seq;

Creating a biosequence object

The "constructor" method in Seq.pm is the new() function.

The proper syntax for accessing the new() function in Seq.pm is as follows:

$myseq = Bio::Seq->new;

Of course, objects are only useful if they have something in them so you would probably want to pass along some additional information or arguments to the constructor. The foundation of any biosequence object is course the sequence itself.

You can address new() with a sequence directly:

$myseq = Bio::Seq->new(-seq=>'AACTGGCGTTCGTG');

Or you can pass in a string or a list:

$myseq = Bio::Seq->new(-seq=>$sequence_string);
$myseq = Bio::Seq->new(-seq=>@sequence_list);

It is also possible to create a new sequence object based on a sequence contained in a file. You can tell constructor where to find the sequence file by passing in the 'file' parameter:

$myseq  = Bio::Seq->new(-file=>'seqfile.gcg');

Because there are so many different conventions or formats for storing sequence information in files, it would be polite (although not absolutely necessary) to tell the constructor what format the sequence file is in. We can provide that information via the file-format or 'ffmt' field. To create a sequence object based upon a GCG-formatted sequence file:

$myseq  = Bio::Seq->new(-file=>'seqfile.gcg',-ffmt=>'GCG');

We've already introduced three different object attributes or arguments that can be passed to the new() object constructor ('seq','file' and 'ffmt') so now would be a good time to introduce them all:

BioSeq Constructor Arguments

file: The "file" argument should be a string value containing path and filename information for a sequence file that is to be read into an object.

seq: The "seq" argument is for passing in sequence directly instead of reading in a sequence file. The sequence should consist of RAW info (no whitespace, newlines or formatting) and can be passed in as either an array/list or string.

id: The "id" argument should be a ONE-WORD string value giving a short name for the sequence.

desc: The "desc" argument should be a string containing a description of the sequence. This field is not limited to one word.

names: The "names" argument should be a hash or reference to a hash that contains any number of user generated key-value pairs. Various bits of identifying information can be stored here including name(s), database locations, accession numbers, URL's, etc.

type: The "type" argument should be a string value describing the sequence type eg; "Dna", "Rna" or "Amino".

origin: The "origin" argument should be a string value describing sequence origin info

start: The start point, in biological coordinates of the sequence

end: The end point, in biological coordinates of the last residue in the sequence

start/end attributes are not strongly tied to what is actually in the sequence (ie, $seq->start()+length($seq->getseq()) doesn't necessarily equal $seq->end()-1 - most of the time it should).

This is to allow some oddities to be stored in the Seq object sensibly.

The numbering convention is 'biological' coordinates. ie the sequence ATG would start at 1 (A) and finish at 3 (G). (NB - this is different from how perl represents ranges in sequences).

numbering() is equivalent to start() (old version). Eventually it will be removed. numbering() accesses the same attribute as start()

numbering: (Deprecated) The "numbering" argument should be an integer value containing the sequence numbering offset value. By default all sequence are numbered starting with 1.

ffmt: The "ffmt" argument should be a string describing sequence file-format. If a sequence is being read from a file via the "file" argument, "ffmt" is used to invoke the proper parsing code. "ffmt" is also the default format for sequence output when the layout method is called. See elsewhere in this documentation for info regarding recognized sequence file-formats.

If most of these arguments were used at once to create a sequence object, it would look something like this:

#Set up the name hash
%names = (
'CloneID','DB1',
'Isolate','5',
'Tissue','Xenopus',
'Location','/usr2/users/dag/bioperl/sample.tfa'
);

$name_ref = \%names;

#Create the object
$myseq = new Bio::Seq(-file=>'sample.tfa',
                      -names=>$name_ref,
                      -type=>'Dna',
                      -origin=>'Xenopus mesoderm',
                      -start=>'1',
                      -desc=>'Sample Bio::Seq sequence',
                      -ffmt=>'Fasta');

Methods

Once an object has been created, there are defined ways to go about accessing the information -- users are encouraged to poke around "under the hood" of Seq.pm to see what is going on but it is considered bad form to bypass the defined accession methods and mess around with the internal code. Bypassing the defined methods "voids the warrantee" of the module and can lead to problems down the road. The implied agreement between module creators and users is that the creators will strive to keep the interface standard and backwards-compatible while the users will avoid becoming dependent on bits of internal code that may change or disappear in future revisions.

Detailed information about each method described here can be found in the Appendix.

Accessing information

For each defined way to access information from a biosequence object, there is a corresponding "method" that is invoked. What follows is a brief description of each accessor method. For more detailed information see the individual annotations for each method near the end of this document.

  • Sequence

    The sequence can be accessed in several ways via the getseq() method. Depending on how it is invoked, it can return either a string or a list value.

    Both examples are appropriate:

    @sequence_list   = $myseq->getseq;
    $sequence_string = $myseq->getseq;

    Sequence "slices" can be accessed by passing start and stop integer position arguments to getseq():

    @slice = $myseq->getseq($start,$stop);
    @slice = $myseq->getseq(1,50);
    @slice = $myseq->getseq(100);

    If no stop value is passed in, getseq() will return a slice from the start position to the end of the sequence. Slices are returned in the context of the object "start" attribute, not absolute position so be aware of the objects numbering scheme.

    Sequences can also be accessed in with the ary() and str() methods. The ary() method will always return a list value and str() will always return a string. Otherwise they are functionally identical to the getseq() method.

      $sequence = $myseq->str;
      @sequence = $myseq->ary;
    
      @slice = $myseq->ary($start,$stop);
      $slice = $myseq->str($start,$stop);
  • Sequence length

    The sequence length can be accessed using the seq_len() method

    $len = $myseq->seq_len;
  • Sequence ID

    The ID field can be accessed using the id() method

    $ID = $myseq->id;
  • Description

    The object description field can be accessed using the desc() method

    $description = $myseq->desc;
  • Names

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be accessed by

    %name_hash = $myseq->names;
  • Sequence start

    The biological position of the first residue in the sequence sequence can be accessed via start()

    $start = $myseq->start;
  • Sequence end

    The biological position of the last residue in the sequence sequence can be accessed via end()

    $end = $myseq->end;
  • Sequence Origin

    The object origin (source organism) field can be accessed via origin()

    $seq_origin = $myseq->origin;
  • File input format / default output format

    The object format field can be accessed using the ffmt() method

    $format = $myseq->ffmt;

Changing Information in Sequence Objects

In the previous section it was shown how object attributes and values could be retrieved from a sequence object by calling upon various methods. Many of the above methods will also allow the user to CHANGE object attributes by passing in additional arguments. Detailed information on each method can be found in the Appendix.

  • Changing the sequence

    The sequence information for an object can be changed by passing a string or list value to the setseq() method. Here are some ways that sequence information can be changed

    $myseq->seqseq($new_sequence_string);
    $myseq->setseq(@new_sequence_list);
    $myseq->setseq("aaccttgcctgc");

    The setseq() method checks sequence elements and warns if it finds non-standard characters. Because of this, arbitrary sequence compositions are not supported at this time. This method is considered slightly 'insecure' because the 'id','desc' and 'type' fields are not updated along with the sequence. If necessary, the user must make the appropriate changes to these fields whenever sequence information is updated or changed.

  • Changing the sequence ID

    The ID field can be changed by passing in a new ID argument to id()

    $myseq->id($new_id);
  • Changing the object description

    The object description field can be changed by passing in a new argument to desc()

    $myseq->desc($new_desc);
  • Changing the object names hash

    The associative array (hash) that contains flexible information regarding alternative sequence names, database locations, accession numbers, etc. can be changed by passing in a reference to a new hash to names()

    $hash_ref = \%name_hash;
    $myseq->names($hash_ref);
  • Changing the sequence start or end

    The default numbering offset for the sequence can be changed by passing in a new value to start() or end()

    $myseq->start(1);
    $myseq->start($new_value);
  • Sequence Origin

    The object origin field can be changed by passing in a new string value to origin()

    $myseq->origin("mitochondrial");
    $myseq->origin($origin_string);
  • File input format / default output format

    The object format field can be accessed by passing in a new value to ffmt()

    $myseq->ffmt("GCG"); 

Manipulating sequences

Creating, accessing and changing biosequence objects and fields is all well and good, but eventually you are going to want to actually do some work.

Included with Seq.pm are some commonly used utility methods for manipulating sequence data. So far Seq.pm contains methods for:

  • Copying a biosequence object

    using copy()

    $new_obj = $myseq->copy;
  • Reversing a sequence

    using reverse()

    $reversed_seq = $myseq->reverse;
  • Complementing a sequence

    The 2nd strand, or "complement" of a biosequence can be obtained by calling upon the complement() method.

    $comp_seq = $myseq->complement;
  • Reverse complementing a sequence

    using revcom()

    $rev_comp = $myseq->revcom;
  • Translating Dna to Rna

    using Dna_to_Rna()

    $rna_seq = $myseq->Dna_to_Rna;
  • Translating Rna to Dna

    using Rna_to_Dna()

    $dna_seq = $myseq->Rna_to_Dna;
  • Translating Dna or Rna to protein

    using translate()

    $peptide_seq = $myseq->translate;
  • Checking the sequence alphabet

    To check if any nonstandard characters are present in a biosequence, an alphabet_ok() method is provided. The method returns "1" if everything is OK, otherwise it returns a "0".

    if($myseq->alphabet_ok) { print "OK!!\n"; }
     else { print "Not OK! \n"; }

    To get alphabet itself, use the alphabet() method, which will return a string containing all characters in the current alphabet.

    $alph = $myseq->alphabet;

    To use restrictive alphabets that do not permit ambiguity codes, include '-strict => 1' in the parameters sent to new(). Or, for any existing sequence object, try:

    $myseq->strict(1); 
    $myseq->alphabet_ok() or die "alphabet not okay.\n";

Sequence Output

There are several methods for outputting formatted sequences. For your convenience, a "meta-output" method called layout() also exists.

If layout() is called without any arguments, it calls upon the output methods as defined by the "ffmt" field.

print $myseq->layout;

The "ffmt" field is mainly used to describe the format of a sequence being read in from a file. It is also used as the default format for all sequence output. If these differ (ie; the format that the sequence was read in is not desired as a default output style) then "ffmt" should be set manually via the ffmt() accessor method. Of course, after reading the sequence in you are free to change "ffmt" at will.

layout() can also be called with specific formats:

$gcg_formatted_seq = $myseq->layout("GCG"):
$fasta_seq = $myseq->layout("Fasta"):

Calling output methods directly

Many output methods accept unique named parameters/arguments that allow a greater degree of control over output format and style, to take advantage of these abilities, the formatting methods must be called directly. See the appendix notes describing each output format for detailed information.

print $myseq->out_GCG(-date->"10 May 1996",
                      -caps-"up");

Most output methods will return either a string or list value depending on how they are invoked, check the detailed method documentation in the Appendix to be sure.

  @formatted_seqlist = $myseq->out_genbank(-id=>'New ID',
                                           -def=>'User defined definition',
                                           -acc=>'User defined accession');

  $formatted_seqstring = $myseq->out_genbank(-id=>'New ID',
                                             -def=>'User defined definition',
                                             -acc=>'User defined accession');

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

vsns-bcd-perl@lists.uni-bielefeld.de          - General discussion
vsns-bcd-perl-guts@lists.uni-bielefeld.de     - Technically-oriented discussion
http://bio.perl.org/MailList.html             - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via email or the web:

bioperl-bugs@bio.perl.org                   
http://bio.perl.org/bioperl-bugs/           

ACKNOWLEDGEMENTS

Some pieces of the code were contributed by Steven E. Brenner, Steve Chervitz, Ewan Birney, Tim Dudgeon, David Curiel, and other Bioperlers. Thanks !!!!

SEE ALSO

UnivAln.pm - The biosequence alignment object
Parse.pm   - The perl interface to ReadSeq

REFERENCES

BioPerl Project Page http://bio.perl.org/

VERSION

Bio::Seq.pm, beta 0.051

COPYRIGHT

Copyright (c) 1996-1998 Chris Dagdigian, Georg Fuellen, Richard Resnick.
All Rights Reserved. This module is free software; you can redistribute 
it and/or modify it under the same terms as Perl itself.

Appendix

The following documentation describes the various functions contained in this module. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_").

new

Title     : new
Usage     : $mySeq = Bio::Seq->new($file,$seq,$id,$desc,$names,
                        $start,$end,$type,$ffmt,$descffmt);
          :                - or -
          : $mySeq = Bio::Seq->new(-file=>$file,
                                  -seq=>$seq,
                                  -id=>$id,
                                  -desc=>$desc,
                                  -names=>$names,
                                  -start=>$start,
                                  -end=>$end,
                                  -type=>$type,
                                  -origin=>$origin,
                                  -ffmt=>$ffmt,
                                  -descffmt=>$descffmt);
Function  : The constructor for this class, returns a new object.
Example   : See usage
Returns   : Bio::Seq object
Argument  : $file: file from which the sequence data can be read; all
              the other arguments will overwrite the data read in.
              "_nofile" is recommanded if no file is given.
            $seq: String or array of characters
            $id: String describing the ID the user wishes to assign.
            $desc: String giving a description of the sequence
            $names: A reference to a hash which stores {loc,name}
                    pairs of other database locations and corresponding names
                    where the sequence is located.
            $start: The offset of the sequence, as an integer
            $end: The end point of the sequence, as an integer
            $type: The type of the sequence, see type()
            $origin: The sequence origin
            $ffmt: Sequence format, see ffmt()
            $descffmt: format of $desc, see descffmt()
   

## Internal methods ##

_initialize

Title     : _initialize
Usage     : n/a (internal function)
Function  : Assigns initial parameters to a blessed object.
Example   : 
Returns   : 
Argument  : As Bio::Seq->new, allows for named or listed parameters.
            See ->new for the legal types of these values.

_seq

Title     : _seq()
Usage     : n/a, internal function
Function  : called by new() to set sequence field. Checks
          : alphabet before setting.
          :
Returns   : n/a
Argument  : sequence string

_monomer

Title     : _monomer()
Usage     : n/a, internal function
Function  : Returns the internal monomer that represents
          : sequence type.
          :
          : Sequence type is treated internally as a monomer
          : defined by the %SeqAlph hash. The type field
          : is a list of format [monomer,origin]. For any
          : output outside the module, the monomer is resolved
          : back into string form via the %TypeSeq hash.
          :
Returns   : original type setting [as monomer]
Argument  : none

_file_read

Title     : _file_read()
Usage     : n/a (Internal Function)
Function  : _file_read is called whenever the constructor is called 
          : with the name of a sequence to be read from disk.
          :
          : This function is now DEPRECATED. you should use the SeqIO
          : system
          :
Example   : n/a, only called upon by _initialize()
Returns   : 
Argument  : 

## ACCESSORS ##

seq_len

Title       : seq_len()
Usage       : $len = $myseq->seq_len;
Function    : Returns a value representing the sequence
            : length
            :
Example     : see above
Arguments   : none
Returns     : integer

ary

Title     : ary
Usage     : ary([$start,[$end]])
Function  : Returns the sequence of the object as an array, or a substring
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the substring is from $start to the
            end of the sequence.
Example   : @slice = $myObject->ary(3,9);
Returns   : array of characters
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

str

Title     : str
Usage     : str([$start,[$end]])
Function  : Returns the sequence of the object as a string, or a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : $slice = $myObject->str(3,9);
Returns   : string scalar
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

seq

Title     : seq
Usage     : seq([$start,[$end]])
Function  : Returns the sequence of the object as an array or a char
            string, depending on the value of wantarray. Will rtn a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : @slice = $myObject->seq(3,9);
Returns   : regular array of characters, or a scalar string
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})
Comments  : 

getseq

Title     : getseq
Usage     : getseq([$start,[$end]])
Function  : Returns the sequence of the object as an array or a char
            string, depending on the value of wantarray. Will rtn a slice
            of the sequence if $start/$end are defined. If $start is
            defined and $end isn't, the slice is from $start to the
            end of the sequence.
Example   : @slice = $myObject->seq(3,9);
Returns   : regular array of characters, or a scalar string
Throws    : Warning about deprecated method.
Argument  : $start,$end (both integers). They are interpreted w.r.t. the
            specific numeration of the sequence!! ($self->{start})

id

Title     : id()
Usage     : $seq_id = $myseq->id; 
          : $myseq->id($id_string);
          :
Function  : Sets field if an ID argument string is
          : passed in. If no arguments, returns ID value for
          : object.
          :
Returns   : original ID value
Argument  : sequence string

desc

Title     : desc()
Usage     : $description = $myseq->desc; 
          : $myseq->desc($desc_string);
          :
Function  : Sets field if an argument string is
          : passed in. If no arguments, returns original value for
          : object description field.
          :
Returns   : original value for description
Argument  : sequence string

names

Title     : names()
Usage     : %names = $myseq->names; 
          : $myseq->names($hash_ref);
          :
Function  : Sets field if a name hash refrence is
          : passed in. If no arguments, returns original 
          : names hash.
          :
Returns   : hash refrence (associative array)
Argument  : refrence to a hash (associative array)

numbering

Title     : numbering()
Usage     : $num_start = $myseq->start; 
          : $myseq->start($value);
          :
Function  : Sets field if an argument is
          : passed in. If no arguments, returns original value.
          :
          : (Deprecated - should switch to start())
Returns   : original value 
Argument  : new value

start

Title     : start
Usage     : $start = $myseq->start(); #get
          : $myseq->start($value); #set
Function  : the set/get for the start position
Example   :
Returns   : start value 
Arguments : new value

end

Title     : end
Usage     : $end = $myseq->end(); #get
          : $myseq->end($value); #set
Function  : The set/get for the end position
Example   :
Returns   : end value 
Arguments : new value

get_nse

Title    : get_nse
Usage    : $tag = $myseq->get_nse() #
Function : gets a string like "name/start-end". This is likely
         : to be unique in an alignment/database
         : Used alot by SimpleAlign
Example  :
Returns  : A string
Arguments: Two optional arguments - first being the name/ separator, second the
           start-end separator

origin

Title     : origin()
Usage     : myseq->origin($value) 
Function  : Sets the origin field which is actually the second
          : field of the Type list. The {type} field is a 2 value list
          : with a format of ["Monomer","Origin"]
          :
Returns   : Original value
Argument  : string
Comments  : SAC: Consider renaming this method to "organism()" or "species()". 
          : "origin" is ambiguous and can be easily confused with 
          : a coordinate data (0,0).

type

Title     : type()
Usage     : myseq->type($value) 
Function  : Sets the type field which is the first
          : field of the Type list. The {type} field is a 2 value list
          : with a format of ["Monomer","Origin"]
          :
Returns   : String containing one of the recognized sequence types:
          : 'unknown', 'dna', 'rna', 'amino', 'otherseq', 'aligned'
          : See the %Seq::SeqAlph hash for the current types.
Argument  : string containing a valid sequence type
          : SAC: case of user-supplied argument does not matter

ffmt

Title     : ffmt()
Usage     : $format = $myseq->ffmt;
          : $myseq->ffmt("Fasta");
          : 
Function  : The file format field is used by the internal
          : sequence parsing code when trying to read 
          : in a sequence file. It is also what is used
          : as a default output format if the layout
          : method is called without an argument.
          :
          : If a sequence object is created without
          : reading in a file, or if the file is read
          : in with the use of the ReadSeq package then
          : the ffmt field can be set to indicate any default
          : output-format preference.
          :
          : If a sequence is read from a file and parsed
          : by internal code (ReadSeq not used) then the ffmt
          : field should describe the format of the sequence
          : file. The ffmt field is used to send the sequence
          : to the correct internal parsing code.
          :
Returns   : original ffmt value
Argument  : recognized ffmt string value (see list of recognized 
          : formats) # SAC: What are they?! This list should be obvious.
          : Valid strings: 
          :    RAW, FASTA, GCG, IG, GENBANK, NBRF, EMBL, 
          :    MSF, PIR, GCG_SEQ, GCG_REF, STRIDER, ZUKER,
          : SAC: case of user-supplied argument does not matter

descffmt

Title     : descffmt()
Usage     : $desc = $myseq->descffmt;
          : $myseq->descffmt($new_value); 
Function  : 
          :
Returns   : original value
Argument  : $new_value (one of the formats as defined in $SeqForm).
          : SAC: case of $new_value argument does not matter.

setseq

Title     : setseq()
Usage     : $self->setseq($new_sequence);
Function  : Changes the sequence inside a bioseq object
          :
Returns   : sequence string 
Argument  : sequence string

parse

Title     : parse
Usage     : parse($ent,[$ffmt]);
Function  : Invokes the proper parsing code depending on
          : the value of the object 'ffmt' field.
Example   : $self->parse;
Returns   : n/a
Argument  : the prospective sequence to be parsed, 
          : and optionally its format so that it doesn't need to
          : be estimated
          : SAC: case of $ffmt argument does not matter.

parse_raw

Title     : parse_raw
Usage     : parse_raw;
Function  : parses $ent into the $self->{"seq"} field, using Raw
          : file format.
Example   : $self->parse_raw;
Returns   : n/a
Argument  : n/a

parse_genbank

Title    : parse_genbank

= cut

sub parse_genbank { my ($self) = shift; my ($ent) = @_; my $seqstart = false; my $defstart = false;

 my @lines = split("\n", $ent);
 for ( @lines ) {
   chomp;
   
   m/LOCUS\s*(\S+)/ and $self->{"id"} = $1;
   
   m/DEFINITION\s*(.+)/ and do { $self->{"desc"} = $1; $defstart = true; };
   $defstart and do {
     m/^ {11}( .+)/ or $defstart = false;
     $defstart and $self->{"desc"} .= $1; };
   
   m/ORIGIN/ and do { $seqstart = true; next; };
   m!//! and $seqstart = false;
   $seqstart and do { s/[\s|\d]//g; $self->{"seq"} .= $_; };
 }

 return 1;
}

#_______________________________________________________________________

parse_fasta

Title     : parse_fasta
Usage     : parse_fasta;
Function  : parses $ent into the "seq" field, using Fasta
          : file format.
          :
To-do     : use benchmark module to find best/fastest parse
          : method
          :
Example   : $self->parse_fasta;
Returns   : n/a
Argument  : n/a

parse_gcg

Title    : parse_gcg
Usage    : used by internal code
Function : Parses the sequence out of a gcg-format string and
         : sets the object sequence field accordingly. This is
         : a simple, ineffecient method for grabbing JUST the
         : sequence.
         :
To-do    : - parse out more info than just sequence 
         : - implement alphabet checking
         : - better regular expressions/efficiency
         : - carp on unexpected / wrong-format situations
         :
Version  : .01 / 16 Jan 1997 
Returns  : 1
Argument : gcg-formatted sequence string

## METHODS FOR FILE FORMAT AND OUTPUT ##

#_______________________________________________________________________

layout

 Title    : layout()
Usage     : layout([$format]);
Function  : Returns the sequence in whichever format the user specifies,
            or in the "ffmt" field if the user does not specify a format.
Example   : $fastaFormattedSeq = $myObj->layout("Fasta");
Returns   : varies
Argument  : $format (one of the formats as defined in $SeqForm).
          : SAC: case of $ffmt argument does not matter.

out_raw

Title     : out_raw
Usage     : out_raw;
Function  : Returns the sequence in Raw format.
Example   : $self->out_raw;
Returns   : string sequence, in raw format
Argument  : n/a

out_fasta

Title     : out_fasta
Usage     : out_fasta;
Function  : Returns the sequence as a string in FASTA format.
Example   : $self->out_fasta;
          :
To-do     : benchmark code / find fastest method
          :
Returns   : string sequence in Fasta format
Argument  : n/a

alphabet_ok

Title     : alphabet_ok
Usage     : $myseq->alphabet_ok;
Function  : Checks the sequence for presence of any characters
          : that are not considered valid members of the genetic
          : alphabet. In addition to the standard genetic alphabet
          : (see documentation), "?" and "-" characters are
          :  considered valid.
          :
Example   : if($myseq->alphabet_ok) { print "OK!!\n"; }
          :     else { print "Not OK! \n"; }
          :
Note      : Does not handle '\' characters in sequence robustly
          :
Returns   : 1 if OK / 0 if not OK
Argument  : none

alphabet

Title     : alphabet
Usage     : $myseq->alphabet;
Function  : Returns the characters in the alphabet in use for the sequence.
Example   : print "Alphabet: ".$myseq->alphabet;
Returns   : string containing alphabet characters
Argument  : none

GCG_checksum

Title     : GCG_checksum
Usage     : $myseq->GCG_checksum;
Function  : returns a gcg checksum for the sequence
Example   : 
Returns   : 
Argument  : none

trunc

Title     : trunc
Usage     : $trunc_seq = $mySeq->trunc(12,20);
Function  : Returns a truncated part of the sequence, truncation
            happening by the ->str() call. This is just a convience call
            therefore for this object

Returns   : Bio::Seq object ref.
Argument  : start point, end point in biological coordinates

copy

Title     : copy
Usage     : $copyOfObj = $mySeq->copy;
Function  : Returns an identical copy of the object.
Example   :
Returns   : Bio::Seq object ref.
Argument  : n/a

revcom

Title       : revcom
Usage       : $reverse_complemented_seq = $mySeq->revcom;
Function    : Returns a char string containing the reverse
            : complement of a nucleotide object sequence
Example     : $reverse_complemented_seq = $mySeq->revcom;
Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
            : library of molbio perl routines
Note        :
            : The letter codes and compliment translations
            : are those proposed by IUB (Nomenclature Committee,
            : 1985, Eur. J. Biochem. 150; 1-5) and are also
            : used by the GCG package. The IUB/GCG letter codes
            : for nucleotide ambiguity are compatible with
            : EMBL, GenBank and PIR database formats but are
            : *NOT* compatible with Stadem/Sanger ambiguity
            : symbols. Staden/Sanger use different symbols to
            : represent uncertainty and frame abiguity.
            :
            : Currently Staden/Sanger are not recognized
            : sequence types.
            :
            : GCG Documentation on sequence symbols:
URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices/appendix_iii.html
            :
Translation :
            : GCG/IUB    Meaning        Complement
            : ------------------------------------
            :  A            A                T
            :  C            C                G
            :  G            G                C
            :  T            T                A
            :  U            U                A
            :  M          A or C             K
            :  R          A or G             Y
            :  W          A or T             W
            :  S          C or G             S
            :  Y          C or T             R
            :  K          G or T             M
            :  V        A or C or G          B
            :  H        A or C or T          D
            :  D        A or G or T          H
            :  B        C or G or T          V
            :  X      G or A or T or C       X
            :  N      G or A or T or C       N
            :--------------------------------------
Revision    : 0.01 / 3 Jun 1997
Returns     : A new sequence object (fixed by eb)
              to get the actual sequence go
              $actual_reversed_sequence = $seq->revcom()->str()
Argument    : n/a

complement

Title       : complement
Usage       : $complemented_seq = $mySeq->compliment;
Function    : Returns a char string containing 
            : the complementary sequence (eg; other strand)
            : of the original sequence. The translation method
            : is identical to revcom() but the nucleotide order
            : is not reversed. 
            :
            : To be honest *most* of the time you will want
            : to use revcom not this. Be careful!
            :
Example     :  $complemented_seq = $mySeq->complement;
            :
Source      : Guts from Jong's <jong@mrc-lmb.cam.ac.uk>
            : library of molbio perl routines
Note        :
            : The letter codes and complement translations
            : are those proposed by IUB (Nomenclature Committee,
            : 1985, Eur. J. Biochem. 150; 1-5) and are also
            : used by the GCG package. The IUB/GCG letter codes
            : for nucleotide ambiguity are compatible with
            : EMBL, GenBank and PIR database formats but are
            : *NOT* compatible with Stadem/Sanger ambiguity
            : symbols. Staden/Sanger use different symbols to
            : represent uncertainty and frame abiguity.
            :
            : Currently Staden/Sanger are not recognized
            : sequence types.
            :
            : GCG Documentation on sequence symbols:
URL         : http://www.neb.com/gcgdoc/GCGdoc/Appendices
            : /appendix_iii.html
            :
Translation :
            : GCG/IUB    Meaning        Complement
            : ------------------------------------
            :  A            A                T
            :  C            C                G
            :  G            G                C
            :  T            T                A
            :  U            U                A
            :  M          A or C             K
            :  R          A or G             Y
            :  W          A or T             W
            :  S          C or G             S
            :  Y          C or T             R
            :  K          G or T             M
            :  V        A or C or G          B
            :  H        A or C or T          D
            :  D        A or G or T          H
            :  B        C or G or T          V
            :  X      G or A or T or C       X
            :  N      G or A or T or C       N
            :--------------------------------------
            :
Revision    : 0.01 / 6 Dec 1996
Returns     : char string
Argument    : n/a

#_______________________________________________________________________'

reverse

Title     : reverse
Usage     : $reversed_seq = $mySeq->reverse;
Function  : Returns a char string containing the
          : reverse of the object sequence
          :
          : Does *NOT* complement it. If you want
          : the other strand, use $mySeq->revcom()
          : 
Example   :  $reversed_seq = $mySeq->reverse;
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : char string
Argument  : n/a

Dna_to_Rna

Title     : Dna_to_Rna
Usage     : $translated_seq = $mySeq->Dna_to_Rna;
Function  : Returns a char string containing the
          : Rna translation of the Dna nucleotide sequence
          : (Replaces T with U)
          : 
Example   : $translated_seq = $mySeq->Dna_to_Rna;
          :
Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
          : library of molbio perl routines
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : char string
Argument  : n/a

Rna_to_Dna

Title     : Rna_to_Dna
Usage     : $translated_seq = $mySeq->Rna_to_Dna;
Function  : Returns a char string containing the
          : Dna translation of the Rna nucleotide sequence
          : (Replaces U with T)
          : 
Example   : $translated_seq = $mySeq->Rna_to_Dna;
          :
Revision  : 0.01 / 16 MAR 1997
Returns   : char string
Argument  : n/a

translate

Title     : translate
Usage     : 
Function  : Returns a new Bio::Seq object with the protein
          : translation from this sequence
          :
          : "*" is the default symbol for a stop codon
          : "X" is the default symbol for an unknown codon
          :
Example   : $translation = $mySeq->translate;
          :   -or- with user defined stop/unknown codon symbols:
          : $translation = $mySeq->translate($stop_symbol,$unknown_symbol);
          : 
Source    : modified from Jong's <jong@mrc-lmb.cam.ac.uk>
          : library of molbio perl routines
          :
To-do     : - allow named parameters (just like new and out_GCG )
          : - allow "frame" parameter to pick translation frame
          :
Revision  : 0.01 / 6 Dec 1996
Returns   : new Sequence object. Its id is the original id.trans
Argument  : n/a

dump

Title     : dump
Usage     : @results = $mySeq->dump; -or- 
          : $results = $mySeq->dump;
          :
Function  : Returns a formatted array or string (depending on how it
          : is invoked) containing the contents of a 
          : Bio::Seq object. Useful for debugging
          :
          : ***This is used by Chris Dagdigian for debugging ***
          : ***Probably should be removed before distribution***
          :
Example   :  @results = $mySeq->dump;
          :  foreach(@results){print;}
          :     -or-
          :  print $myseq->dump;
          :
Returns   : Array or string depending on value of wantarray
Argument  : n/a

out_bad

Title     : out_bad()
Usage     : out_bad;
Function  : Throws a fatal error if we don't know the output format.
Example   : $self->out_bad;
Returns   : n/a
Argument  : n/a

out_primer

Title     : out_primer()
Usage     : $formatted_seq = $myseq->out_primer;
          : @formatted_seq = $myseq->out_primer;
          :
          : print $myseq->out_primer(-id=>'New ID',
          :                          -header=>'This is my header');
          :
Function  : outputs a sequence in primer format
          :
Note      : Not a supported output type -  (cant be invoked via layout)
          : Use at your own risk :)
          : 
Example   : see usage
          :
Revision  : 0.01 / 20 Dec 1996
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters for "id" and "header" are alowed

out_pir

Title     : out_pir()
Usage     : $formatted_seq = $myseq->layout("PIR");
          : $formatted_seq = $myseq->out_pir;
          : @formatted_seq = $myseq->out_pir;
          :
          : print $myseq->out_pir(-title=>'New TITLE',
          :                       -entry=>'New ENTRY',
          :                       -acc=>'User defined accession',
          :                       -date=>'User defined date',
          :                       -reference=>'User defined ref info');
          :
Function  : Returns a string or an array depending on how it
          : is invoked. Can be easily accessed via the layout()
          : method, or if more output control is desired it can
          : be called directly with the folowing named parameters:
          :
          :  -entry      PIR entry
          :  -title      PIR title
          :  -acc        user defined accession number
          :  -reference  user defined reference
          :  -date       user defined date/time info
          :
          : All named parameters will take precedance over any
          : default behavior. When there are no user arguments,
          : the default output is as follows:
          :
          : PIR 'ENTRY'     = sequence object "id" field
          : PIR 'TITLE'     = sequence object "desc" field
          : PIR 'DATE'      = curent date/time
          : PIR 'ACC'       = not used in default output
          : PIR 'REFERENCE' = not used in default output
          :
Note      : Not tested stringently.
          :
WARNING   : Does not deal with numbering issue
          :
To-do     : - Allow user to pass in hash of additional fields/values
          : - Deal with numbering issue
          :
Example   : see usage
          :
Revision  : 0.02 / 12 Jan 1997
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters are allowed, see above

out_genbank

Title     : out_genbank()
Usage     : $formatted_seq = $myseq->out_genbank;
          : @formatted_seq = $myseq->out_genbank;
          : print $myseq->out_genbank(-id=>'New ID',
          :                           -def=>'User defined definition',
          :                           -acc=>'User defined accession',
          :                           -origin=>'User defined origin info',
          :                           -spacing=>'single',
          :                           -caps=>'up',
          :                           -date=>'DATE GOES HERE',
          :                           -type=>'mRna');
          :   
Function  : Returns a GenBank formatted sequence array or string
          : depending on the value of wantarray when invoked via layout(). 
          : If more control is desired over output format, out_genbank() 
          : can be addressed directly with the following named parameters:
          :
          : def          - Sequence definition information
          : acc          - Sequence accession number
          : origin       - Sequence origin information
          : id           - short name 
          : date         - new date info
          : type         - sequence type (Dna, mRna, Amino, etc.)
          : spacing      - "single" or "double" sequence line spacing
          : caps         - "up" or "down" sequence capitalization
          :
          : When invoked via layout() or called directly with no 
          : arguments, the following default behaviours apply:
          :  DATE = Current date and time
          :  DEFINITION = object's description field
          :  ID = object's ID field
          :  SPACING = single
          :
          : All named parameters must be strings. Passed in parameters will
          : always take precedence over any fields with default settings.
          :
Note      : Format not stringently tested for accuracy. Sequence is numbered
          : according to the integer specified in the object 'start' field
          : but the implementation has not been robustly tested.
          :
To-do     : - allow user hash reference for additional format fields
          :
Example   : see usage
          :
Revision  : 0.02 / 12 Jan 1997
Returns   : string or list, depending on how it is invoked
Argument  : named list parameters are allowed, see above

out_GCG

Title    : out_GCG
Usage    : $formatted_seq = $mySeq->layout("GCG"); 
         : @formatted_seq = $mySeq->layout("GCG");
         : 
         : print $myseq->out_GCG(-id=>'New ID',
         :                      -spacing=>'single',
         :                      -caps=>'up',
         :                      -date=>'DATE GOES HERE',
         :                      -header=>'This is a user submitted header',
         :                      -type=>'n');
         :   
Function : Returns a GCG formatted sequence array or string
         : depending on the value of wantarray when invoked via layout(). 
         : If more control is desired over output format, out_GCG() 
         : can be addressed directly with the following named parameters:
         :
         : header       - first line(s) of formatted sequence
         : id           - short name that appears before 'Length:' field
         : date         - overwrite default date info
         : type         - can be "N" or "P", for nucleotide/protein
         : spacing      - "single" or "double" sequence line spacing
         : caps         - "up" or "down" sequence capitalization
         :
         : When invoked via layout() or called directly with no 
         : arguments, the following default behaviours apply:
         :  DATE = Current date and time
         :  DEFINITION = object's description field
         :  ID = object's ID field
         :  SPACING = single
         :         
         : All named parameters must be strings. Passed in parameters will
         : always take precedence over any fields with default settings.
         :
Example  :  
Output   :
         :Sample Bio::Seq sequence
         : sample Length: 240  Wed Nov 27 13:24:28 EST 1996  Type: N Check: 5371  ..
         :
         :       1  aaaacctatg gggtgggctc tcaagctgag accctgtgtg cacagccctc
         :      51  tggctggtgg cagtggagac gggatnnnat gacaagcctg ggggacatga
         :     101  ccccagagaa ggaacgggaa caggatgagt gagaggaggt tctaaattat
         :     151  ccattagcac aggctgccag tggtccttgc ataaatgtat agagcacaca
         :     201  ggtgggggga aagggagaga gagaagaagc cagggtataa
         :
         :
Note     : GCG formatted sequences contain a "Type:" field.
         : If Type cannot be internally determined and no
         : Type name-parameter is passed in then the Type: 
         : field is not printed.
         :
Warning  : Unconventional numbering offsets may not
         : be robustly handled
         :
Revision : 0.06 / 12 Jan 1997
Source   : Found guts of this code on bionet.gcg, unknown author
Returns  : Array or String
Argument : n/a

out_nbrf

Title     : out_nbrf()
Usage     : $self->layout("NBRF") or $self->out_nbrf
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_gcgseq

Title     : out_gcgseq
Usage     : out_gcgseq;
Function  : Returns the sequence as a string in GCG_SEQ format.
Example   : $self->out_gcgseq;
          :
Returns   : string sequence in GCG_SEQ format
Argument  : n/a
Comments  : SAC: Derived from out_fasta().
          : GCG_SEQ is a format that looks alot like Fasta and is used
          : for building GCG sequence datasets (.seq files).
          : It also has some similarities to NBRF format.

out_gcgref

Title     : out_gcgref
Usage     : out_gcgref;
Function  : Returns the sequence as a string in GCG_REF format.
Example   : $self->out_gcgref;
          :
Returns   : string sequence in GCG_REF format
Argument  : n/a
Comments  : SAC: Derived from out_gcgseq().
          : GCG_REF is a companion format for GCG_SEQ that is used
          : for building GCG sequence datasets (.ref files).
          : The .ref file is identical to .seq file but without the sequence.

out_ig

Title     : out_ig()
Usage     : $self->layout("IG") or $self->out_ig
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_strider

Title     : out_strider()
Usage     : $self->layout("Strider") or $self->out_strider
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_zuker

Title     : out_zuker()
Usage     : $self->layout("Zuker") or $self->out_zuker
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

out_msf

Title     : out_msf()
Usage     : $self->layout("MSF") or $self->out_msf
          :
Function  : FORMAT NOT INTERNALLY IMPLEMENTED YET!!!
          :
          : If the ReadSeq wrapper Parse.pm apppears 
          : to be configured properly it is used
          : to generate the output. 
          :
          : If Parse.pm cannot be used then this code
          : carps out with an error message.
          :
To-do     : write internal output code
          :
Version   : 1.0 /  16 MAR 1997
Example   : see Usage
Returns   : FORMATTED STRING (wantarray is not used here!)
Argument  : 

parse_unknown

Title     : parse_unknown
Usage     : parse_unknown($ent);
Function  : tries to figure out the format of $ent and then
          : calls the appropriate function to parse it into $self->{"seq"}.
Example   : $self->parse_unknown;
Returns   : n/a
Argument  : $ent : the rough multi-line string to be parsed

parse_bad

Title     : parse_bad
Usage     : parse_bad;
Function  : complains of un-parsable sequence, last-ditch attempt via
          : Parse.pm if sequence is being read from a file.
          :
Example   : $self->parse_bad;
Returns   : n/a
Argument  : n/a

version

Title     : version();
Usage     : $myseq->version;
Function  : prints Bio::Seq current version number

Bio::Seq Guts

Sequence Object

The sequence object is merely a reference to a hash containing
all or some of the following fields...

Field         Value
--------------------------------------------------------------
seq           the sequence

id            a short identifier for the sequence

desc          a description of the sequence, in descffmt file-format

names         a hash of identifiers that relate to the sequence..
              these could be Database ID's, Accession #'s, URL's,
              pathnames, etc. Currently there is no set format
              for the names hash and no formal definition of databases 
              or names

start         start in bio-coords of the first residue of the sequence

end           end in bio-coords of the first residue of the sequence

type          the sequence type. Is actually a 2 value list of format
              ["monomer","origin"] where monomer is one of the
              recognized sequence types and origin is a string
              description of the sequences' origin (mitochondrial, etc)

ffmt          file-format for the sequence

descffmt      file-format of the description string