NAME

Bio::AlignIO::stockholm - stockholm sequence input/output stream

SYNOPSIS

# Do not use this module directly.  Use it via the L<Bio::AlignIO> class.

use Bio::AlignIO;
use strict;

my $in = Bio::AlignIO->new(-format => 'stockholm',
                           -file   => 't/data/testaln.stockholm');
while( my $aln = $in->next_aln ) {

}

DESCRIPTION

This object can transform Bio::Align::AlignI objects to and from stockholm flat file databases. This has been completely refactored from the original stockholm parser to handle annotation data and now includes a write_aln() method for (almost) complete stockholm format output.

Stockholm alignment records normally contain additional sequence-based and alignment-based annotation

GF Lines (alignment feature/annotation):
#=GF <featurename> <Generic per-file annotation, free text>
Placed above the alignment

GC Lines (Alignment consensus)
#=GC <featurename> <Generic per-column annotation, exactly 1
     character per column>
Placed below the alignment

GS Lines (Sequence annotations)
#=GS <seqname> <featurename> <Generic per-sequence annotation, free
     text>

GR Lines (Sequence meta data)
#=GR <seqname> <featurename> <Generic per-sequence AND per-column
     mark up, exactly 1 character per column>

Currently, sequence annotations (those designated with GS tags) are parsed only for accession numbers and descriptions. It is intended that full parsing will be added at some point in the near future along with a builder option for optionally parsing alignment annotation and meta data.

The following methods/tags are currently used for storing and writing the alignment annotation data.

  Tag        SimpleAlign
               Method  
  ----------------------------------------------------------------------
   AC        accession  
   ID        id  
   DE        description
  ----------------------------------------------------------------------

  Tag        Bio::Annotation   TagName                    Parameters
             Class
  ----------------------------------------------------------------------
   AU        SimpleValue       record_authors             value
   SE        SimpleValue       seed_source                value
   GA        SimpleValue       gathering_threshold        value
   NC        SimpleValue       noise_cutoff               value
   TC        SimpleValue       trusted_cutoff             value
   TP        SimpleValue       entry_type                 value
   SQ        SimpleValue       num_sequences              value
   PI        SimpleValue       previous_ids               value
   DC        Comment           database_comment           comment
   CC        Comment           alignment_comment          comment
   DR        Target            dblink                     database
                                                          primary_id
                                                          comment
   AM        SimpleValue       build_method               value
   NE        SimpleValue       pfam_family_accession      value
   NL        SimpleValue       sequence_start_stop        value
   SS        SimpleValue       sec_structure_source       value
   BM        SimpleValue       build_model                value
   RN        Reference         reference                  *
   RC        Reference         reference                  comment
   RM        Reference         reference                  pubmed
   RT        Reference         reference                  title
   RA        Reference         reference                  authors
   RL        Reference         reference                  location
  ----------------------------------------------------------------------
* RN is generated based on the number of Bio::Annotation::Reference objects

Custom annotation

Some users may want to add custom annotation beyond those mapped above. Currently there are two methods to do so; however, the methods used for adding such annotation may change in the future, particularly if alignment Writer classes are introduced. In particular, do not rely on changing the global variables @WRITEORDER or %WRITEMAP as these may be made private at some point.

1) Use (and abuse) the 'custom' tag. The tagname for the object can differ from the tagname used to store the object in the AnnotationCollection.

# AnnotationCollection from the SimpleAlign object
my $coll = $aln->annotation; 
my $factory = Bio::Annotation::AnnotationFactory->new(-type => 
    Bio::Annotation::SimpleValue');
my $rfann = $factory->create_object(-value => $str, 
                                    -tagname => 'mytag');
$coll->add_Annotation('custom', $rfann);
$rfann = $factory->create_object(-value => 'foo',
                                -tagname => 'bar');
$coll->add_Annotation('custom', $rfann);

OUTPUT:

# STOCKHOLM 1.0

#=GF ID myID12345
#=GF mytag katnayygqelggvnhdyddlakfyfgaglealdffnnkeaaakiinwvaEDTTRGKIQDLV??
#=GF mytag TPtd~????LDPETQALLV???????????????????????NAIYFKGRWE?????????~??
#=GF mytag ??HEF?A?EMDTKPY??DFQH?TNen?????GRI??????V???KVAM??MF?????????N??
#=GF mytag ???DD?VFGYAEL????DE???????L??D??????A??TALELAY??????????????????
#=GF mytag ?????????????KG??????Sa???TSMLILLP???????????????D??????????????
#=GF mytag ???????????EGTr?????AGLGKLLQ??QL????????SREef??DLNK??L???AH????R
#=GF mytag ????????????L????????????????????????????????????????R?????????R
#=GF mytag ??QQ???????V???????AVRLPKFSFefefdlkeplknlgmhqafdpnsdvfklmdqavlvi
#=GF mytag gdlqhayafkvd????????????????????????????????????????????????????
#=GF mytag ????????????????????????????????????????????????????????????????
#=GF mytag ????????????????????????????????????????????????????????????????
#=GF mytag ????????????????????????????????????????????????????????????????
#=GF mytag ?????????????INVDEAG?TEAAAATAAKFVPLSLppkt??????????????????PIEFV
#=GF mytag ADRPFAFAIR??????E?PAT?G????SILFIGHVEDPTP?msv?
#=GF bar foo
...

2) Modify the global @WRITEORDER and %WRITEMAP.

# AnnotationCollection from the SimpleAlign object
my $coll = $aln->annotation;

# add to WRITEORDER
my @order = @Bio::AlignIO::stockholm::WRITEORDER;
push @order, 'my_stuff';
@Bio::AlignIO::stockholm::WRITEORDER = @order;

# make sure new tag maps to something
$Bio::AlignIO::stockholm::WRITEMAP{my_stuff} = 'Hobbit/SimpleValue';

my $rfann = $factory->create_object(-value => 'Frodo',
                                    -tagname => 'Hobbit');
$coll->add_Annotation('my_stuff', $rfann);
$rfann = $factory->create_object(-value => 'Bilbo',
                                 -tagname => 'Hobbit');
$coll->add_Annotation('my_stuff', $rfann);

OUTPUT:

# STOCKHOLM 1.0

#=GF ID myID12345
#=GF Hobbit Frodo
#=GF Hobbit Bilbo
....

FEEDBACK

Support

Please direct usage questions or support issues to the mailing list:

bioperl-l@bioperl.org

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

http://bugzilla.open-bio.org/

AUTHORS - Chris Fields, Peter Schattner

Email: cjfields-at-uiuc-dot-edu, schattner@alum.mit.edu

CONTRIBUTORS

Andreas Kahari, ak-at-ebi.ac.uk Jason Stajich, jason-at-bioperl.org

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

new

 Title   : new
 Usage   : my $alignio = Bio::AlignIO->new(-format => 'phylip'
					  -file   => '>file');
 Function: Initialize a new L<Bio::AlignIO::phylip> reader or writer
 Returns : L<Bio::AlignIO> object
 Args    : -line_length :  length of the line for the alignment block
           -alphabet    :  symbol alphabet to set the sequences to.  If not set,
                           the parser will try to guess based on the alignment
                           accession (if present), defaulting to 'dna'.
           -spaces      :  (optional, def = 1) boolean to add a space in between
                           the "# STOCKHOLM 1.0" header and the annotation and
                           the annotation and the alignment.

next_aln

Title   : next_aln
Usage   : $aln = $stream->next_aln()
Function: returns the next alignment in the stream.
Returns : L<Bio::Align::AlignI> object
Args    : NONE

write_aln

Title   : write_aln
Usage   : $stream->write_aln(@aln)
Function: writes the $aln object into the stream in stockholm format
Returns : 1 for success and 0 for error
Args    : L<Bio::Align::AlignI> object

line_length

Title   : line_length
Usage   : $obj->line_length($newval)
Function: Set the alignment output line length
Returns : value of line_length
Args    : newvalue (optional)

alphabet

Title   : alphabet
Usage   : $obj->alphabet('dna')
Function: Set the sequence data alphabet
Returns : sequence data type
Args    : newvalue (optional)

spaces

Title   : spaces
Usage   : $obj->spaces(1)
Function: Set the 'spaces' flag, which prints extra newlines between the
          header and the annotation and the annotation and the alignment
Returns : sequence data type
Args    : newvalue (optional)

alignhandler

Title   : alignhandler
Usage   : $stream->alignhandler($handler)
Function: Get/Set the Bio::HandlerBaseI object
Returns : Bio::HandlerBaseI 
Args    : Bio::HandlerBaseI