PerlMol Quick Tutorial
Introduction
The modules in the PerlMol toolkit are designed to simplify the handling of molecules from Perl programs in a general and extensible way. These modules are object-oriented; however, this tutorial assumes little or no knowledge of object-oriented programming in Perl.
This document shows some of the more common methods included in the PerlMol toolkit, in a reasonable order for a quick introduction. For more details see the perldoc pages for each module.
How to read a molecule from a file
The following code will read a PDB file:
use Chemistry::Mol;
use Chemistry::File::PDB;
my $mol = Chemistry::Mol->read("test.pdb");
The first two lines (which only need to be used once in a given program) tell Perl that you want to use
the specified modules The third line reads the file and returns a Mol object.
To read other formats such as MDL molfiles, you need to use
the corresponding module, such as Chemistry::File::MDLMol. Readers for several formats are under development.
The Mol object
Chemistry::Mol->read
returns a Mol object. An object is a data structure of a given class that has methods (i.e. subroutines) associated with it. To access or modify an object's properties, you call the methods through the "arrow syntax":
my $name = $mol->name; # return the name of the molecule
$mol->name("water"); # set the name of the molecule to "water"
A Mol object contains essentialy a list of atoms, a list of bonds, and a few generic properties such as name, type, and id. The atoms and bonds themselves are also objects.
Writing a molecule file
To write a molecule to a file, just use the write
method:
$mol->write("test.pdb"); # see note below
Make sure you use
d the right module. Note: at the time of this writing, the Chemistry::File::PDB file does not yet implement the writing feature. Please stay tuned!
Selecting atoms in a molecule
You can get an array of all the atoms by calling the atoms method without parameters, or a specific atom by giving its index:
@all_atoms = $mol->atoms;
$atom3 = $mol->atoms(3);
You can select atoms that match an arbitrary expression by using Perl's built-in grep
function:
# get all oxygen atoms within 3.0 Angstroms of atom 37
@close_oxygens = grep {
$_->symbol eq 'O'
and $_->distance($mol->atoms(37)) < 3.0
} $mol->atoms;
The grep
function loops through all the atoms returned by $mol->atoms, aliasing each to $_ at each iteration, and returns only those for which the expression in braces is true.
Using grep
is the general way of finding atoms; however, since finding atoms by name is very common, a convenience method is available for that purpose.
$HB1 = $mol->atoms_by_name('HB1');
@H_atoms = $mol->atoms_by_name('H.*'); # the name is treated as a regex
Since the atom name is not generally unique, even the first example above might match more than one atom. In that case, only the first one found is returned. In the second case, since you are asigning to an array, all matching atoms are returned.
The Atom object
Atoms are usually the most interesting objects in a molecule. Some of their main properties are Z, symbol, and coords.
$atom->Z(8); # set atomic number to 8
$symbol = $atom->symbol;
$coords = $atom->coords;
Atom coordinates
The coordinates returned by $atom->coords are a Math::VectorReal object. You can print these objects and use them to do vector algebra:
$c1 = $atom1->coords;
$c2 = $atom2->coords;
$dot_product = $c1 . $c2; # returns a scalar
$cross_product = $c1 x $c2; # returns a vector
$delta = $c2 - $c1; # returns a vector
$distance = $delta->length; # returns a scalar
print $c1; # prints something like "[ 1.0E0 2.0E0 3.0E0 ]"
($x, $y, $z) = $c1->array; # set $x, $y, $z to the components of $c1
Since one is very often interested in calculating the distance between atoms, Atom objects provide a distance
method to save some typing:
$d = $atom1->distance($atom2);
$d2 = $atom1->distance($molecule2);
In the second case, the value obtained is the minimum distance between the atom and the molecule. This can be useful for things such as finding the water molecules closest to a given atom.
The Bond object
A bond object is a list of atoms with an associated bond order. In most cases, a bond has exactly two atoms, but we don't want to exclude possibilities such as three-center bonds. You can get the list of atoms in a bond by using the atoms
method.
@atoms_in_bond = $bond->atoms;
The other interesting method for Bond objects is length
, which returns the distance between the two atoms in a bond (this method requires that the bond have two atoms).
my $bondlength = $bond->length;
In addition to these properties, Bond objects have the generic properties described below. The most important of these, as far as bonds are concerned, is type
.
Generic properties
There are three generic properties that all PerlMol objects have:
- id
-
Each object must have a unique ID. In most cases you don't have to worry about it, because it is assigned automatically unless you specify it. You can use the by_id method to select an object:
$atom = $mol->by_id("a42");
In general, ids are preferable to indices because they don't change if you delete or move atoms or other objects.
- name
-
The name of the object does not have any meaning from the point of view of the core modules, but most file types have the concept of molecule name, and some (such as PDB) have the concept of atom names.
- type
-
Again, the meaning of type is not universally defined, but it would likely be used to specify atom types and bond orders.
Besides these, the user can specify arbitrary attributes, as discussed in the next section.
User-specified attributes
The core PerlMol classes define very few, very generic properties for atoms and molecules. This was chosen as a "minimum common denominator" because every file format and program has different ideas about the names, values and meaning of these properties. For example, some programs only allow bond orders of 1, 2, and 3; some also have "aromatic" bonds; some use calculated non-integer bond orders. PerlMol tries not to commit to any particular convention, but it allows you to specify whatever attributes you want for any object (be it a molecule, an atom, or a bond). This is done through the attr
method.
$mol->attr("melting point", "273.15"); # set m.p.
$color = $atom->attr("color"); # get atom color
The core modules store these values but they don't know what they mean and they don't care about them. Attributes can have whatever name you want, and they can be of any type. However, by convention, non-core modules that need additional attributes should prefix their name with a namespace, followed by a slash. (This is done to avoid modules fighting over the same attribute name.) For example, atoms created by the PDB reader module (Chemistry::File::PDB) have the "pdb/residue" attribute.
my $mol = Chemistry::Mol->read("test.pdb");
my $atom = $mol->atoms(1234);
print $atom->attr("pdb/residue"); # prints "ALA123"
Macromolecules
So far we have assumed that we are dealing with molecules of the Mol class. However, one of the interesting things about object-oriented programming is that classes can be extended. For dealing with macromolecules, we have the MacroMol class, which extends the Mol class. This means that in practice you can use a MacroMol object exactly as you would use a Mol object, but with some added functionality. In fact, the PDB reader can return MacroMol instead of Mol objects by just changing the first example like this:
use Chemistry::MacroMol;
use Chemistry::File::PDB;
my $macromol = Chemistry::MacroMol->read("test.pdb");
Now the question is, what is the "added functionality" that MacroMol objects have on top of the original Mol object?
The MacroMol object
For the purposes of this module, a macromolecule is considered to be a big molecule where atoms are divided in Domains. A domain is just a subset of the atoms in the molecule; in a protein, a domain would be just a residue. Note that in principle it is not required that all atoms belong to one and only one domain; you could have atoms that belong to several domains or atoms that don't belong to any domain. (But in the case of PDB files, it happens that all atoms belong to one domain.)
You can select domains in a molecule in a way similar to that used for atoms and bonds, in this case through the domains
method:
my @all_domains = $macromol->domains;
my $domain = $macromol->domains(57);
The Domain object
A domain is a substructure of a larger molecule. Other than having a parent molecule, a domain is just like a molecule. In other words, the Domain class extends the Mol class; it is basically a collection of atoms and bonds.
my @atoms_in_domain = $domain->atoms;
my $atom5_in_domain = $domain->atoms(5);
If you want to get at a given atom in a given domain in a macromolecule, you can "chain" the method calls without having to save the Domain object in a temporary variable:
my $domain57_atom5 = $macromol->domains(57)->atoms(5);
my $res233_HA = $macromol->domains(233)->atoms_by_name('HA');
The second example is a good way of selecting an atom from a PDB file when you know the residue number and atom name.
See Also
The perldoc pages for Chemistry::Mol, Chemistry::Atom, Chemistry::Bond, Chemistry::File, Chemistry::MacroMol, Chemistry::Domain