NAME

deob_index.pl - extracts BioPerl documentation and indexes it in a database for easy retrieval

VERSION

This document describes deob_index.pl version 0.0.3

SYNOPSIS

deob_index.pl <path to BioPerl lib> <output path>

<path to BioPerl lib>

a directory path pointing to the root of the BioPerl lib tree. e.g. /export/share/lib/perl5/site_perl/5.8.7/Bio/

<output path>

where you would like deob_index.pl to put its output files.

DESCRIPTION

deob_index.pl goes through the entire BioPerl library tree looking for .pm and .pl files. For each one it finds, it tries to extract module-level POD documentation (e.g. SYNOPSIS, DESCRIPTION) and store it in a BerkeleyDB. It also tries to extract documentation for each method in the module and store that in a separate BerkeleyDB.

Specific parts of the documentation for a module or method may be retrieved individually using the functions available in Deobfuscator.pm. See that module for details.

While going through and trying to parse each module, deob_index.pl also reports what pieces of the documentation it can't find. For example, if a method's documentation doesn't describe the data type it returns, this script logs that information to a file. This type of automated documentation- checking could be used to standardize and improve the documentation in BioPerl.

deob_index.pl creates four files:

package_list.txt

A plaintext file listing each package found in the BioPerl directory that was searched. Packages are listed by their module names, such as 'Bio::SeqIO'. This file is used by deob_interface.cgi.

packages.db

A Berkeley DB, which stores package-level documentation, such as the synopsis and the description. Each key is a package name, e.g. "Bio::SeqIO", and each value string is composed of the individual pieces of the documentation kept separate by unique string record separators. The individual pieces of documentation are pulled out of the string using the get_pkg_docs function in Deobfuscator.pm. See that package for details.

methods.db

Like packages.db, methods.db is also a Berkeley DB, except it stores various pieces of information about individual methods available to a class. Each method might have documentation about its usage, its arguments, its return values, an example, and a description of its function.

Each key is the fully-qualified method name, e.g. "Bio::SeqIO::next_seq". Each value is a string containing all of the pieces of documentation concatenated together and separated by unique strings serving as record separators. The extraction of the actual documentation in these strings is handled by the get_method_docs subroutine in the Deobfuscator.pm module. See that package for details.

Not all methods will have all of these types of documentation, and some methods will not have the different pieces of information clearly labeled and separated. For the latter type, deob_index.pl will try to store whatever free-form documentation that does exist, and the get_method_docs function in Deobfuscator.pm, if called without arguments, will return that documentation.

deob_index.log

This file contains detailed information about errors encountered while trying to extract documentation during the indexing process.

Each line in deob_index.log is a key-value pair describing a single parsing error.

DIAGNOSTICS

These are the parsing error codes reported in 'deob_index.log'.

Package errors

PKG_NAME

couldn't find the name of the package

SYNOPSIS

couldn't find the synopsis

DESC

couldn't find the description

METHODS

couldn't find any methods

PKG_DUP

This package name occurs more than once

Method errors

FUNCTION

couldn't find the function description

EXAMPLE

couldn't find the example

ARGS

couldn't find the method's arguments

USAGE

couldn't find the usage statement

RETURNS

couldn't find the return values

FREEFORM

This method's documentation doesn't conform to the BioPerl standard of having clearly-labeled fields for title, function, example, args, usage, and returns.

METH_DUP

This method name occurs more than once

CONFIGURATION AND ENVIRONMENT

This software requires:

A working installation of the Berkeley DB

The Berkeley DB comes standard with most UNIX distributions, so you may already have it installed. See http://www.sleepycat.com for more information.

BioPerl

deob_index.pl recursively navigates a directory of BioPerl modules. Note that the BioPerl module directory need not be "installed"; any old location will do. See http://www.bioperl.org for the latest version.

DEPENDENCIES

version, File::Find, DB_File

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

deob_index.pl currently expects the sections of POD in a BioPerl module to be in a particular order, namely: NAME, SYNOPSIS, DESCRIPTION, CONSTRUCTORS, ... , APPENDIX. Those sections are expected to be marked with =head1 POD tags, and the documentation for each method is expected to be in =head2 sections in the APPENDIX. The order of SYNOPSIS and DESCRIPTION can be flipped, but this behavior should not be taken as encouragement to do so.

Most, but not all BioPerl modules conform to this standard. Those that do not will cause deob_index.pl to report them as errors. Although the consistency of this standard is desirable for end-users of the documentation, this code probably needs to be a little bit more flexible (patches welcome!).

This software has only been tested in a UNIX environment.

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

bioperl-l@bioperl.org                       - General discussion
http://www.bioperl.org/wiki/Mailing_lists   - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

http://bugzilla.bioperl.org/

SEE ALSO

Deobfuscator, deob_interface.cgi, deob_detail.cgi

AUTHOR

Dave Messina <dave-pause@davemessina.net>

CONTRIBUTORS

Laura Kavanaugh
David Curiel

ACKNOWLEDGMENTS

This software was developed originally at the Cold Spring Harbor Laboratory's Advanced Bioinformatics Course between Oct 12-25, 2005. Many thanks to David Curiel, who provided much-needed guidance and assistance on this project.

LICENSE AND COPYRIGHT

Copyright (C) 2005-6 Laura Kavanaugh and Dave Messina. All Rights Reserved.

This module is free software; you may redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER

This software is provided "as is" without warranty of any kind.