NAME

Chemistry::File::SMILES - SMILES linear notation parser/writer

SYNOPSYS

#!/usr/bin/perl
use Chemistry::File::SMILES;

# parse a SMILES string
my $s = 'C1CC1(=O)[O-]';
my $mol = Chemistry::Mol->parse($s, format => 'smiles');

# print a SMILES string
print $mol->print(format => 'smiles');

# print a unique (canonical) SMILES string
print $mol->print(format => 'smiles', unique => 1);

# parse a SMILES file
my @mols = Chemistry::Mol->read("file.smi", format => 'smiles');

# write a multiline SMILES file
Chemistry::Mol->write("file.smi", mols => [@mols]);

DESCRIPTION

This module parses a SMILES (Simplified Molecular Input Line Entry Specification) string. This is a File I/O driver for the PerlMol project. http://www.perlmol.org/. It registers the 'smiles' format with Chemistry::Mol.

This parser interprets anything after whitespace as the molecule's name; for example, when the following SMILES string is parsed, $mol->name will be set to "Methyl chloride":

CCl	 Methyl chloride

The name is not included by default on output. However, if the name option is defined, the name will be included after the SMILES string, separated by a tab.

print $mol->print(format => 'smiles', name => 1);

Multiline SMILES and SMILES files

A file or string can contain multiple molecules, one per line.

CCl	 Methyl chloride
CO	 Methanol

Files with the extension '.smi' are assumed to have this format.

OPTIONS

aromatic

On output, detect aromatic atoms and bonds by means of the Chemistry::Ring module, and represent the organic aromatic atoms with lowercase symbols.

unique

When used on output, canonicalize the structure if it hasn't been canonicalized already and generate a unique SMILES string. This option implies "aromatic".

kekulize

When used on input, assign single or double bond orders to "aromatic" or otherwise unspecified bonds (i.e., generate the Kekule structure). If false, the bond orders will remain single. This option is true by default. This uses assign_bond_orders from the Chemistry::Bond::Find module.

CAVEATS

Reading branches that start before an atom, such as (OC)C, which should be equivalent to C(OC) and COC, according to some variants of the SMILES specification. Many other tools don't implement this rule either.

VERSION

0.41

SEE ALSO

Chemistry::Mol, Chemistry::File

The SMILES Home Page at http://www.daylight.com/dayhtml/smiles/ The Daylight Theory Manual at http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

The PerlMol website http://www.perlmol.org/

AUTHOR

Ivan Tubert <itub@cpan.org>

COPYRIGHT

Copyright (c) 2004 Ivan Tubert. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.