Chemistry::MidasPattern - Select atoms in macromolecules
use Chemistry::MidasPattern;
use Chemistry::File::PDB;
# read a molecule
my $mol = Chemistry::MacroMol->read("test.pdb");
# define a pattern matching carbons alpha and beta
# in all valine residues
my $str = ':VAL@CA,CB';
my $patt = Chemistry::MidasPattern->new($str);
# apply the pattern to the molecule
# extract the results
for my $atom ($patt->atom_map) {
printf "%s\t%s\n", $atom->attr("pdb/residue_name"), $atom->name;
printf "FOUND %d atoms\n", scalar($patt->atom_map);
This module partially implements a pattern matching engine for selecting atoms in macromolecules by using Midas/Chimera patterns. See for a detailed description of this language.
This module shares the same interface as Chemistry::Pattern; to perform a pattern matching operation on a molecule, follow these steps.
1) Create a pattern object, by parsing a string. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.
2) Execute the pattern on the molecule by calling $patt->match($mol).
3) If $patt->match() returns true, extract the "map" that relates the pattern to the molecule by calling $patt->atom_map. These method returns a list of the atoms in the molecule that are matched by the pattern. Thus $patt->atom_map(1) would be analogous to the $1 special variable used for regular expresion matching. The difference between Chemistry::Pattern and Perl regular expressions is that atoms are always captured, and that each atom always uses one "slot".
The current implementation does not have the concept of a model, only of residues and atoms.
What follows is not exactly a formal grammar specification, but it should give a general idea:
The star here means "zero or more", and the parentheses are used to delimit the effect of the star. The : and @ are used verbatim.
RESIDUE can be a name (e.g., LYS), a sequence number (e.g., 108), a range (e.g., 1-10), or a comma-separated list of RESIDUEs (e.g. 1-10,6,LYS).
ATOM is an atom name, a serial number (this is a non-standard extension) or a comma-separated list of ATOMs.
Names can have wildcards: * matches the whole name; ? matches one character; and = matches zero or more characters. An @ATOM specification is asociated with the closest preceding residue specification.
Atoms within a certain distance of those that are matched by a selector can be selected by using the za< operator, where DISTANCE is a number in Angstroms.
The result of two or more selectors can be intersected using the & operator.
:ARG All arginine atoms
:ARG@* All arginine atoms
@CA All alpha carbons
:*@CA All alpha carbons
:ARG@CA Arginine alpha carbons
:VAL@C= Valine carbons
:VAL@C? Valine carbons with two-letter names
:ARG,VAL@CA Arginine and valine alpha carbons
:ARG:VAL@CA All arginine atoms and valine alpha carbons
:ARG@CA,CB Arginine alpha and beta carbons
:ARG@CA@CB Arginine alpha and beta carbons
:1-10 Atoms in residues 1 to 10
:48-* Atoms in residues 11 to the last one
:30-40@CA & :ARG Alpha carbons in residues 1-10 which are
also arginines.
@123 Atom 123
@123 za<5.0 Atoms within 5.0 Angstroms of atom 123
@123 za>30.0 Atoms not within 30.0 Angstroms of atom 123
@CA & @123 za<5.0 Alpha carbons within 5.0 Angstroms of atom 123
If a feature does not appear in any of the examples, it is probably not implemented. For example, the zr< zone specifier, atom properties, Chimera extensions such as chains, etc.
The zone specifiers (selection by distance) currently use a brute-force N^2 algorithm. You can optimize an & expression by putting the most unlikely selectors first; for example
:1-20 zr<10.0 & :38 atoms in residue 38 within 10 A of atoms
in residues 1-20 (slow)
:38 & :1-20 zr<10.0 atoms in residue 38 within 10 A of atoms
in residues 1-20 (not so slow)
In the first case, the N^2 search measures the distance between every atom in the molecule and every atom in residues 1-20, and then intersects the results with the atom list of residue 28; the second case only measures the distance between every atom in residue 38 with every atom in residues 1-20. The second way is much, much faster for large systems.
Some day, a future version may implement a smarter algorithm...
Chemistry::File::MidasPattern, Chemistry::Pattern
The PerlMol website
Ivan Tubert <>
Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.