NAME

FAST::Bio::SeqIO::seqxml - SeqXML sequence input/output stream

SYNOPSIS

# Do not use this module directly.  Use it via the FAST::Bio::SeqIO class.

use FAST::Bio::SeqIO;

# read a SeqXML file
my $seqio = FAST::Bio::SeqIO->new(-format => 'seqxml',
                            -file   => 'my_seqs.xml');

while (my $seq_object = $seqio->next_seq) {
    print join("\t", 
               $seq_object->display_id,
               $seq_object->description,
               $seq_object->seq,           
              ), "\n";
}

# write a SeqXML file
#
# Note that you can (optionally) specify the source
# (usually a database) and source version.
my $seqwriter = FAST::Bio::SeqIO->new(-format        => 'seqxml',
                                -file          => ">outfile.xml",
                                -source        => 'Ensembl',
                                -sourceVersion => '56');
$seqwriter->write_seq($seq_object);

# once you've written all of your seqs, you may want to do
# an explicit close to get the closing </seqXML> tag
$seqwriter->close; 

DESCRIPTION

This object can transform FAST::Bio::Seq objects to and from SeqXML format. For more information on the SeqXML standard, visit http://www.seqxml.org.

In short, SeqXML is a lightweight sequence format that takes advantage of the validation capabilities of XML while not overburdening you with a strict and complicated schema.

This module is based in part (particularly the XML-parsing part) on FAST::Bio::TreeIO::phyloxml by Mira Han.

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

bioperl-l@bioperl.org                  - General discussion
http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

Support

Please direct usage questions or support issues to the mailing list:

bioperl-l@bioperl.org

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

https://redmine.open-bio.org/projects/bioperl/

AUTHORS - Dave Messina

Email: dmessina@cpan.org

CONTRIBUTORS

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

_initialize

Title   : _initialize
Usage   : $self->_initialize(@args) 
Function: constructor (for internal use only).

          Besides the usual SeqIO arguments (-file, -fh, etc.),
          FAST::Bio::SeqIO::seqxml accepts three arguments which are used
          when writing out a seqxml file. They are all optional.
Returns : none
Args    : -source         => source string (usually a database name)
          -sourceVersion  => source version. The version number of the source
          -seqXMLversion  => the version of seqXML that will be used
Throws  : Exception if XML::LibXML::Reader or XML::Writer
          is not initialized

next_seq

Title   : next_seq
Usage   : $seq = $stream->next_seq()
Function: returns the next sequence in the stream
Returns : L<FAST::Bio::Seq> object, or nothing if no more available
Args    : none

write_seq

Title   : write_seq
Usage   : $stream->write_seq(@seq)
Function: Writes the $seq object into the stream
Returns : 1 for success and 0 for error
Args    : Array of 1 or more L<FAST::Bio::PrimarySeqI> objects

_initialize_seqxml_node_methods

Title   : _initialize_seqxml_node_methods
Usage   : $self->_initialize_xml_node_methods
Function: sets up code ref mapping of each seqXML node type
          to a method for processing that node type 
Returns : none
Args    : none

schemaLocation

Title   : schemaLocation
Usage   : $self->schemaLocation
Function: gets/sets the schema location in the <seqXML> header
Returns : the schema location string
Args    : To set the schemaLocation, call with a schemaLocation as the argument.

source

Title   : source
Usage   : $self->source
Function: gets/sets the data source in the <seqXML> header
Returns : the data source string
Args    : To set the source, call with a source string as the argument.

sourceVersion

Title   : sourceVersion
Usage   : $self->sourceVersion
Function: gets/sets the data source version in the <seqXML> header
Returns : the data source version string
Args    : To set the source version, call with a source version string
          as the argument.

seqXMLversion

Title   : seqXMLversion
Usage   : $self->seqXMLversion
Function: gets/sets the seqXML version in the <seqXML> header
Returns : the seqXML version string.
Args    : To set the seqXML version, call with a seqXML version string
          as the argument.

Methods for parsing the XML document

processXMLNode

Title   : processXMLNode
Usage   : $seqio->processXMLNode
Function: reads the XML node and processes according to the node type
Returns : none
Args    : none
Throws  : Exception on unexpected XML node type, warnings on unexpected
          XML element names.

processAttribute

Title   : processAttribute
Usage   : $seqio->processAttribute(\%hash_for_attribute);
Function: reads the attributes of the current element into a hash
Returns : none
Args    : hash reference where the attributes will be stored.

parseHeader

Title   : parseHeader
Usage   : $self->parseHeader();
Function: reads the opening <seqXML> block and grabs the metadata from it,
          namely the source, sourceVersion, and seqXMLversion.
Returns : none
Args    : none
Throws  : Exception if it hits an <entry> tag, because that means it's
          missed the <seqXML> tag and read too far into the file.

element_seqXML

Title   : element_seqXML
Usage   : $self->element_seqXML
Function: processes the opening <seqXML> node
Returns : none
Args    : none

element_entry

Title   : element_entry
Usage   : $self->element_entry
Function: processes a sequence <entry> node
Returns : none
Args    : none
Throws  : Exception if sequence ID is not present in <entry> element

element_species

Title   : element_entry
Usage   : $self->element_entry
Function: processes a <species> node, creating a FAST::Bio::Species object
Returns : none
Args    : none
Throws  : Exception if <species> tag exists but is empty,
          or if the attributes 'name' or 'ncbiTaxID' are undefined

element_description

Title   : element_description
Usage   : $self->element_description
Function: processes a sequence <description> node;
          a no-op -- description text is read by
          processXMLnode
Returns : none
Args    : none

element_RNAseq

Title   : element_RNAseq
Usage   : $self->element_RNAseq
Function: processes a sequence <RNAseq> node
Returns : none
Args    : none

element_DNAseq

Title   : element_DNAseq
Usage   : $self->element_DNAseq
Function: processes a sequence <DNAseq> node
Returns : none
Args    : none

element_AAseq

Title   : element_AAseq
Usage   : $self->element_AAseq
Function: processes a sequence <AAseq> node
Returns : none
Args    : none

element_DBRef

Title   : element_DBRef
Usage   : $self->element_DBRef
Function: processes a sequence <DBRef> node,
          creating a FAST::Bio::Annotation::DBLink object
Returns : none
Args    : none

element_property

Title   : element_property
Usage   : $self->element_property
Function: processes a sequence <property> node, creating a
          FAST::Bio::Annotation::SimpleValue object
Returns : none
Args    : none

end_element_RNAseq

Title   : end_element_RNAseq
Usage   : $self->end_element_RNAseq
Function: processes a sequence <RNAseq> node
Returns : none
Args    : none

end_element_DNAseq

Title   : end_element_DNAseq
Usage   : $self->end_element_DNAseq
Function: processes a sequence <DNAseq> node
Returns : none
Args    : none

end_element_AAseq

Title   : end_element_AAseq
Usage   : $self->end_element_AAseq
Function: processes a sequence <AAseq> node
Returns : none
Args    : none

end_element_entry

Title   : end_element_entry
Usage   : $self->end_element_entry
Function: processes the closing </entry> node, creating the Seq object
Returns : a FAST::Bio::Seq object
Args    : none
Throws  : Exception if sequence, sequence ID, or alphabet are missing

end_element_default

Title   : end_element_default
Usage   : $self->end_element_default
Function: processes all other closing tags;
          a no-op.
Returns : none
Args    : none

DESTROY

Title   : DESTROY
Usage   : called automatically by Perl just before object
          goes out of scope
Function: performs a write flush
Returns : none
Args    : none

close

Title   : close
Usage   : $seqio_obj->close(). 
Function: writes closing </seqXML> tag.

          close() will be called automatically by Perl when your
          program exits, but if you want to use the seqXML file
          you've written before then, you'll need to do an explicit
          close first to get the final </seqXML> tag.
Returns : none
Args    : none