NAME

mok - mok - an awk for molecules

SYNOPSIS

mok [OPTION]...  'CODE' FILE...   

DESCRIPTION

The purpose of mok is to read all the molecules found in the files that are given in the command line, and for each molecule execute the CODE that is given. The CODE is given in Perl and it has at its disposal all of the methods of the PerlMol toolkit.

This mini-language is intended to provide a powerful environment for writing "molecular one-liners" for extracting and munging chemical information. It was inspired by the AWK programming language by Aho, Kernighan, and Weinberger, the SMARTS molecular pattern description language by Daylight, Inc., and the Perl programming language by Larry Wall.

Mok takes its name from Ookla the mok, an unforgettable character from the animated TV series "Thundarr the Barbarian", and from shortening "molecular awk". For more details about the mok mini-language, see LANGUAGE SPECIFICATION below.

OPTIONS

-c CLASS

Use CLASS instead of Chemistry::Mol to read molecules

-f FILE

Run the code from FILE instead of the command line

-h

Print usage information and exit

-t TYPE

Assume that every file has the specified TYPE. Available types depend on which Chemistry::File modules are installed, but currently available types include mdl, sdf, smiles, formula, mopac, pdb.

LANGUAGE SPECIFICATION

A mok script consists of a sequence of pattern-action statements and optional subroutine definitions, in a manner very similar to the AWK language.

/pattern/options { action statements }
{ action statements }
sub name { statements }
BEGIN { statements }
END { statements }

When the whole program consists of one unconditional action block, the braces may be omitted.

Program execution is as follows:

1) The BEGIN block is executed as soon as it's compiled, before any other actions are taken.

2) For each molecule in the files given in the command line, each pattern is applied in turn; if the pattern matches, the corresponding statement block is executed. The pattern is optional; statement blocks without a pattern are executed unconditionally. Subroutines are only executed when called explicitly.

3) Finally, the END block is executed.

The statements are evaluated as Perl statements in the Chemistry::Mok::UserCode package. The following chemistry modules are conveniently loaded by default:

Chemistry::Mol;
Chemistry::Atom 'distance', 'angle', 'dihedral';
Chemistry::Bond;
Chemistry::Pattern;
Chemistry::Pattern::Atom;
Chemistry::Pattern::Bond;
Chemistry::File;
Chemistry::File::*;

Pattern Specification

The pattern must be a SMILES string readable by the Chemistry::File::SMILES module. There is plan for SMARTS support in the future. The pattern is specified within slashes, in a way reminiscent of AWK and Perl regular expressions. As in Perl, certain one-letter options may be included after the closing slash. An option is turned on by giving the corresponing lowercase letter and turned off by giving the corresponding uppercase letter.

g/G

Match globally (default: off). When not present, the mok interpreter only matches a molecule once; when present, it tries matching again in other parts of the molecule. For example, /C/ matches butane only once (at an unspecified atom), while /C/g matches four times (once at each atom).

o/O

Overlap (default: on). When set and matching globally, matches may overlap. For example, /CC/go pattern could match twice on propane, but /CC/gO would match only once.

p/P

Permute (default: off). Sometimes there is more than one way of matching the same set of pattern atoms on the same set of molecule atoms. If true, return these "redundant" matches. For example, /CC/gp could match ethane with two different permutations (forwards and backwards).

Special Variables

When blocks with action statements are executed, the following variables are defined automatically:

$file

The current filename.

$mol

The current molecule as a Chemistry::Mol object.

$match

The current match as a Chemistry::Pattern object.

$patt

The current pattern as a SMILES string.

Special Blocks

Within action blocks, the following block names can be used with Perl funcions such as next and last:

MATCH
BLOCK
MOL
FILE

EXAMPLES

mok 'print $mol->name, "\n"' *.sdf

Prints the names of all the molecules found in all the .sdf files in the current directory.

mok '/C(=O)OC/{ printf "$file: %s (%s)\n", 
    $mol->name, $mol->formula }' *.mol

Finds esters among *.mol; prints the filename, molecule name, and formula.

mok '{ $n += $mol->atoms } END { print "Total: $n atoms\n" }' *.mol

Find out the total number of atoms.

mok '/CS/g{ $n++; $l += $match->bond_map(0)->length }
    END { printf "Average C-S bond length: %.3f\n", $l/$n; }' *.mol

Find out the average C-S bond length.

SEE ALSO

awk(1), perl(1) Chemistry::Mol, Chemistry::Pattern http://dmoz.org/Arts/Animation/Cartoons/Titles/T/Thundarr_the_Barbarian/

AUTHOR

Ivan Tubert <itub@cpan.org>

COPYRIGHT

Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.