NAME
Chemistry::MidasPattern - Select atoms in macromolecules
SYNOPSIS
use Chemistry::MidasPattern;
use Chemistry::File::PDB;
# read a molecule
my $mol = Chemistry::MacroMol->read("test.pdb");
# define a pattern matching carbons alpha and beta
# in all valine residues
my $str = ':VAL@CA,CB';
my $patt = Chemistry::MidasPattern->new($str);
# apply the pattern to the molecule
$patt->match($mol);
# extract the results
for my $atom ($patt->atom_map) {
printf "%s\t%s\n", $atom->attr("pdb/residue_name"), $atom->name;
}
printf "FOUND %d atoms\n", scalar($patt->atom_map);
DESCRIPTION
This module partially implements a pattern matching engine for selecting atoms in macromolecules by using Midas/Chimera patterns. See http://www.cmpharm.ucsf.edu/~troyer/troff2html/midas/Midas-uh-3.html#sh-2.1 for a detailed description of this language.
This module shares the same interface as Chemistry::Pattern; to perform a pattern matching operation on a molecule, follow these steps.
1) Create a pattern object, by parsing a string. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.
2) Execute the pattern on the molecule by calling $patt->match($mol).
3) If $patt->match() returns true, extract the "map" that relates the pattern to the molecule by calling $patt->atom_map. These method returns a list of the atoms in the molecule that are matched by the pattern. Thus $patt->atom_map(1) would be analogous to the $1 special variable used for regular expresion matching. The difference between Chemistry::Pattern and Perl regular expressions is that atoms are always captured, and that each atom always uses one "slot".
MIDAS ATOM SPECIFICATION LANGUAGE QUICK SUMMARY
The current implementation does not have the concept of a model, only of residues and atoms.
What follows is not exactly a formal grammar specification, but it should give a general idea:
SELECTOR = ((:RESIDUE)*(@ATOM)*)*
The star here means "zero or more", and the parentheses are used to delimit the effect of the star. The : and @ are used verbatim.
RESIDUE can be a name (e.g., LYS), a sequence number (e.g., 108), a range (e.g., 1-10), or a comma-separated list of RESIDUEs (e.g. 1-10,6,LYS).
ATOM is an atom name, a serial number (this is a non-standard extension) or a comma-separated list of ATOMs.
Names can have wildcards: * matches the whole name; ? matches one character; and = matches zero or more characters. An @ATOM specification is asociated with the closest preceding residue specification.
DISTANCE_SELECTOR = SELECTOR za< DISTANCE
Atoms within a certain distance of those that are matched by a selector can be selected by using the za< operator, where DISTANCE is a number in Angstroms.
EXPR = ( SELECTOR | DISTANCE_SELECTOR ) (& (SELECTOR | DISTANCE_SELECTOR))*
The result of two or more selectors can be intersected using the & operator.
EXAMPLES
:ARG All arginine atoms
:ARG@* All arginine atoms
@CA All alpha carbons
:*@CA All alpha carbons
:ARG@CA Arginine alpha carbons
:VAL@C= Valine carbons
:VAL@C? Valine carbons with two-letter names
:ARG,VAL@CA Arginine and valine alpha carbons
:ARG:VAL@CA All arginine atoms and valine alpha carbons
:ARG@CA,CB Arginine alpha and beta carbons
:ARG@CA@CB Arginine alpha and beta carbons
:1-10 Atoms in residues 1 to 10
:48-* Atoms in residues 11 to the last one
:30-40@CA & :ARG Alpha carbons in residues 1-10 which are
also arginines.
@123 Atom 123
@123 za<5.0 Atoms within 5.0 Angstroms of atom 123
@123 za>30.0 Atoms not within 30.0 Angstroms of atom 123
@CA & @123 za<5.0 Alpha carbons within 5.0 Angstroms of atom 123
CAVEATS
If a feature does not appear in any of the examples, it is probably not implemented. For example, the zr< zone specifier, atom properties, Chimera extensions such as chains, etc.
The zone specifiers (selection by distance) currently use a brute-force N^2 algorithm. You can optimize an & expression by putting the most unlikely selectors first; for example
:1-20 zr<10.0 & :38 atoms in residue 38 within 10 A of atoms
in residues 1-20 (slow)
:38 & :1-20 zr<10.0 atoms in residue 38 within 10 A of atoms
in residues 1-20 (not so slow)
In the first case, the N^2 search measures the distance between every atom in the molecule and every atom in residues 1-20, and then intersects the results with the atom list of residue 28; the second case only measures the distance between every atom in residue 38 with every atom in residues 1-20. The second way is much, much faster for large systems.
Some day, a future version may implement a smarter algorithm...
VERSION
0.10
SEE ALSO
Chemistry::File::MidasPattern, Chemistry::Pattern
The PerlMol website http://www.perlmol.org/
AUTHOR
Ivan Tubert <itub@cpan.org>
COPYRIGHT
Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.