NAME
Bio::Phylo::NeXML::DOM - XML DOM support for Bio::Phylo
SYNOPSIS
Bio::Phylo::NeXML::DOM->new(
-format
=>
'twig'
);
my
$project
= parse(
-file
=>
'my.nex'
,
-format
=>
'nexus'
);
my
$nex_twig
=
$project
->doc();
DESCRIPTION
This module adds to_dom
methods to Bio::Phylo::NeXML::Writable classes, which provide NeXML-valid objects for document object model manipulation. DOM formats currently available are XML::Twig
and XML::LibXML
. For any XMLWritable
object, use to_dom
in place of to_xml
to create DOM nodes.
The doc()
method is also added to the Bio::Phylo::Project
class. It returns a NeXML document as a DOM object populated by the current contents of the Bio::Phylo::Project
object.
MOTIVATION
The NeXML parsing/writing capability of Bio::Phylo
goes a long way towards wider adoption of this useful standard.
However, while Bio::Phylo
can write NeXML-valid XML, the way in which it does this natively is somewhat hard-coded and therefore restricted, and is essentially oriented toward text file output. As such, there is a mismatch between the sophisticated Bio::Phylo
data structure and its own ability to manipulate and serialize that structure in sophisticated but interoperable ways. Finer manipulations of XML-represented data are possible via through a variety of Perl packages that can store and control XML according to a document object model (DOM). Many of these packages allow extremely flexible computation over large datasets stored in XML format, and admit the use of XML-related facilities such as XPath and XSLT programmatically.
The purpose of Bio::Phylo::NeXML::DOM
is to introduce integrated DOM object creation and manipulation to Bio::Phylo
, both to make DOM computation in Bio::Phylo
more convenient, and also to provide a platform for potentially more sophisticated Bio::Phylo
modules to come.
DESIGN
Besides the notion that DOM capability should be optional for the user, there are two main design ideas. First, for each Bio::Phylo
object that can be parsed/written as NeXML (i.e., for each Bio::Phylo::NeXML::Writable
object), we provide analogous method for creating a representative DOM object, or element. These elements are aggregatable in a DOM document object, whose native stringifying method can be used to generate valid NeXML.
Second, we allow flexibility and extensibility in the choice of the underlying DOM package, while maintaining a consistent DOM interface that is similar in semantic and syntactic style to the accessors and mutators that act on the Bio::Phylo
objects themselves. This is achieved through the DOM::DocumentI and DOM::ElementI interfaces, which define a minimal subset of DOM accessors and mutators, their inputs and outputs. Concrete instances of these interface classes provide the bindings between the abstract methods and their counterparts in the desired DOM implementation. Currently, there are bindings for two popular packages, XML::Twig
and XML::LibXML
.
Another priority was simplicity of use; most of the details remain under the hood in practice. The Bio/Phylo/Util/DOM.pm
file defines the to_dom()
method for each XMLWritable
package, as well as the Bio::Phylo::NeXML::DOM
package proper. The DOM
object is a factory that is used to create Element and Document objects; it is an inside-out object that subclasses Bio::Phylo
. To curb the proliferation of method arguments, a DOM factory instance (set by the latest invocation of Bio::Phylo::NeXML::DOM->new()
) is maintained in a package global. This is used by default for object creation with DOM methods if a DOM factory object is not explicitly provided in the argument list.
The underlying DOM implementation is set with the DOM
factory constructor's single argument, -format
. Even this can be left out; the default implementation is XML::Twig
, which is already required by Bio::Phylo
. Thus, for example, one can use the DOM to convert a Nexus file to a DOM representation as follows:
Bio::Phylo::NeXML::DOM->new();
my
$project
= parse(
-file
=>
'my.nex'
,
-format
=>
'nexus'
);
my
$nex_twig
=
$project
->doc();
# The end.
Underlying DOM packages are loaded at runtime as specified by the -format
argument. Packages for unused formats do not need to be installed.
INTERFACE METHODS
The minimal DOM interface specifies the following methods. Details can be obtained from the Element
and Document
POD.
Bio::Phylo::NeXML::DOM::Element - DOM Element abstract class
get_tagname()
set_tagname()
get_attributes()
set_attributes()
clear_attributes()
get_text()
set_text()
clear_text()
get_parent()
get_children()
get_first_child()
get_last_child()
get_next_sibling()
get_prev_sibling()
get_elements_by_tagname()
set_child()
prune_child()
to_xml_string()
Bio::Phylo::NeXML::DOM::Document - DOM Document
get_encoding()
set_encoding()
get_root()
set_root()
get_element_by_id()
get_elements_by_tagname()
to_xml_string()
to_xml_file()
METHODS
CONSTRUCTOR
- new()
-
Type : Constructor
Title : new
Usage :
$dom
= Bio::Phylo::NeXML::DOM->new(
-format
=>
$format
)
Function: Create a new DOM factory
Returns : DOM object
Args : optional:
-format
=> DOM
format
(defaults to
'twig'
)
FACTORY METHODS
- create_element()
-
Type : Factory method
Title : create_element
Usage :
$elt
=
$dom
->create_element()
Function: Create a new XML DOM element
Returns : DOM element
Args : Optional:
-tag
=>
$tag_name
-attr
=> \
%attr_hash
- parse_element()
-
Type : Factory method
Title : parse_element
Usage :
$elt
=
$dom
->parse_element(
$text
)
Function: Create a new XML DOM element from XML text
Returns : DOM element
Args : An XML String
- create_document()
-
Type : Creator
Title : create_document
Usage :
$doc
=
$dom
->create_document()
Function: Create a new XML DOM document
Returns : DOM document
Args : Package-specific args
- parse_document()
-
Type : Factory method
Title : parse_document
Usage :
$doc
=
$dom
->parse_document(
$text
)
Function: Create a new XML DOM document from XML text
Returns : DOM document
Args : An XML String
MUTATORS
- set_format()
-
Type : Mutator
Title : set_format
Usage :
$dom
->set_format(
$format
)
Returns :
format
designator as string
Args :
format
designator as string
ACCESSORS
- get_format()
-
Type : Accessor
Title : get_format
Usage :
$dom
->get_format()
Function: Get the
format
designator
for
this object
Returns :
format
designator as string
Args : none
- get_dom()
-
Type : Static accessor
Title : get_dom
Usage : __PACKAGE__->get_dom()
Function: Get the singleton DOM object
Returns : instance of this __PACKAGE__
Args : none
SEE ALSO
There is a mailing list at https://groups.google.com/forum/#!forum/bio-phylo for any user or developer questions and discussions.
The DOM creator abstract classes: Bio::Phylo::NeXML::DOM::Element, Bio::Phylo::NeXML::DOM::Document
CITATION
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63
AUTHOR
Mark A. Jensen (maj -at- fortinbras -dot- us), refactored by Rutger Vos
TODO
The Bio::Phylo::Annotation
class is not yet DOMized.