NAME

Chemistry::Pattern - Chemical substructure pattern matching

SYNOPSIS

use Chemistry::Pattern;
use Chemistry::Mol;
use Chemistry::File::SMILES;

# Create a pattern and a molecule from SMILES strings
my $mol_str = "C1CCCC1C(Cl)=O";
my $patt_str = "C(=O)Cl";
my $mol = Chemistry::Mol->parse($mol_str, format => 'smiles');
my $patt = Chemistry::Pattern->parse($patt_str, format => 'smiles');

# try to match the pattern
while ($patt->match($mol)) {
    @matched_atoms = $patt->atom_map;
    print "Matched: (@matched_atoms)\n";
    # should print something like "Matched: (a6 a8 a7)"
}

DESCRIPTION

This module implements basic pattern matching for molecules. The Chemistry::Pattern class is a subclass of Chemistry::Mol, so patterns have all the properties of molecules and can come from reading the same file formats. Of course there are certain formats (such as SMARTS) that are exclusively used to describe patterns.

To perform a pattern matching operation on a molecule, follow these steps.

1) Create a pattern object, either by parsing a file or string, or by adding atoms and bonds by hand by using Chemistry::Mol methods. Note that atoms and bonds in a pattern should be Chemistry::Pattern::Atom and Chemistry::Patern::Bond objects. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.

2) Execute the pattern on the molecule by calling $patt->match($mol).

3) If $patt->match() returns true, extract the "map" that relates the pattern to the molecule by calling $patt->atom_map or $patt->bond_map. These methods return a list of the atoms or bonds in the molecule that are matched by the corresponding atoms in the pattern. Thus $patt->atom_map(1) would be analogous to the $1 special variable used for regular expresion matching. The difference between Chemistry::Pattern and Perl regular expressions is that atoms and bonds are always captured.

4) If more than one match for the molecule is desired, repeat from step (2) until match() returns false.

METHODS

Chemistry::Pattern->new(name => value, ...)

Create a new empty pattern. This is just like the Chemistry::Mol constructor, with one additional option: "options", which expects a hash reference (the options themselves are described under the options() method).

$pattern->options(option => value,...)

Available options:

overlap

If true, matches may overlap. For example, the CC pattern could match twice on propane if this option is true, but only once if it is false. This option is true by default.

permute

Sometimes there is more than one way of matching the same set of pattern atoms on the same set of molecule atoms. If true, return these "redundant" matches. For example, the CC pattern could match ethane with two different permutations (forwards and backwards). This option is false by default.

$mol->new_atom(name => value, ...)

Shorthand for $mol->add_atom(Chemistry::Atom->new(name => value, ...));

$mol->new_bond(name => value, ...)

Shorthand for $mol->add_bond(Chemistry::Bond->new(name => value, ...));

$pattern->atom_map

Returns the list of atoms that matched the last time $pattern->match was called.

$pattern->bond_map

Returns the list of bonds that matched the last time $pattern->match was called.

$pattern->match($mol)

Returns true if the pattern matches the molecule. If called again for the same molecule, continues matching where it left off (in a way similar to global regular expressions under scalar context). When there are no matches left, returns false.

To find out which atoms and bonds matched, use the atom_map and bond_map methods.

VERSION

0.15

SEE ALSO

Chemistry::Pattern::Atom, Chemistry::Pattern::Bond, Chemistry::Mol, Chemistry::File

The PerlMol website http://www.perlmol.org/

AUTHOR

Ivan Tubert <itub@cpan.org>

COPYRIGHT

Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.