NAME
deob_index.pl - extracts BioPerl documentation and indexes it in a database for easy retrieval
VERSION
This document describes deob_index.pl version 0.0.3
SYNOPSIS
deob_index.pl <path to BioPerl lib> <output path>
- <path to BioPerl lib>
-
a directory path pointing to the root of the BioPerl lib tree. e.g. /export/share/lib/perl5/site_perl/5.8.7/Bio/
- <output path>
-
where you would like deob_index.pl to put its output files.
DESCRIPTION
deob_index.pl goes through the entire BioPerl library tree looking for .pm and .pl files. For each one it finds, it tries to extract module-level POD documentation (e.g. SYNOPSIS, DESCRIPTION) and store it in a BerkeleyDB. It also tries to extract documentation for each method in the module and store that in a separate BerkeleyDB.
Specific parts of the documentation for a module or method may be retrieved individually using the functions available in Deobfuscator.pm. See that module for details.
While going through and trying to parse each module, deob_index.pl also reports what pieces of the documentation it can't find. For example, if a method's documentation doesn't describe the data type it returns, this script logs that information to a file. This type of automated documentation- checking could be used to standardize and improve the documentation in BioPerl.
deob_index.pl creates four files:
package_list.txt
-
A plaintext file listing each package found in the BioPerl directory that was searched. Packages are listed by their module names, such as 'Bio::SeqIO'. This file is used by deob_interface.cgi.
packages.db
-
A Berkeley DB, which stores package-level documentation, such as the synopsis and the description. Each key is a package name, e.g. "Bio::SeqIO", and each value string is composed of the individual pieces of the documentation kept separate by unique string record separators. The individual pieces of documentation are pulled out of the string using the get_pkg_docs function in Deobfuscator.pm. See that package for details.
methods.db
-
Like packages.db, methods.db is also a Berkeley DB, except it stores various pieces of information about individual methods available to a class. Each method might have documentation about its usage, its arguments, its return values, an example, and a description of its function.
Each key is the fully-qualified method name, e.g. "Bio::SeqIO::next_seq". Each value is a string containing all of the pieces of documentation concatenated together and separated by unique strings serving as record separators. The extraction of the actual documentation in these strings is handled by the get_method_docs subroutine in the Deobfuscator.pm module. See that package for details.
Not all methods will have all of these types of documentation, and some methods will not have the different pieces of information clearly labeled and separated. For the latter type, deob_index.pl will try to store whatever free-form documentation that does exist, and the get_method_docs function in Deobfuscator.pm, if called without arguments, will return that documentation.
deob_index.log
-
This file contains detailed information about errors encountered while trying to extract documentation during the indexing process.
Each line in deob_index.log is a key-value pair describing a single parsing error.
DIAGNOSTICS
These are the parsing error codes reported in 'deob_index.log'.
Package errors
PKG_NAME
-
couldn't find the name of the package
SYNOPSIS
-
couldn't find the synopsis
DESC
-
couldn't find the description
METHODS
-
couldn't find any methods
PKG_DUP
-
This package name occurs more than once
Method errors
FUNCTION
-
couldn't find the function description
EXAMPLE
-
couldn't find the example
ARGS
-
couldn't find the method's arguments
USAGE
-
couldn't find the usage statement
RETURNS
-
couldn't find the return values
FREEFORM
-
This method's documentation doesn't conform to the BioPerl standard of having clearly-labeled fields for title, function, example, args, usage, and returns.
METH_DUP
-
This method name occurs more than once
CONFIGURATION AND ENVIRONMENT
This software requires:
- A working installation of the Berkeley DB
-
The Berkeley DB comes standard with most UNIX distributions, so you may already have it installed. See http://www.sleepycat.com for more information.
- BioPerl
-
deob_index.pl recursively navigates a directory of BioPerl modules. Note that the BioPerl module directory need not be "installed"; any old location will do. See http://www.bioperl.org for the latest version.
DEPENDENCIES
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
No bugs have been reported.
deob_index.pl currently expects the sections of POD in a BioPerl module to be in a particular order, namely: NAME, SYNOPSIS, DESCRIPTION, CONSTRUCTORS, ... , APPENDIX. Those sections are expected to be marked with =head1 POD tags, and the documentation for each method is expected to be in =head2 sections in the APPENDIX. The order of SYNOPSIS and DESCRIPTION can be flipped, but this behavior should not be taken as encouragement to do so.
Most, but not all BioPerl modules conform to this standard. Those that do not will cause deob_index.pl to report them as errors. Although the consistency of this standard is desirable for end-users of the documentation, this code probably needs to be a little bit more flexible (patches welcome!).
This software has only been tested in a UNIX environment.
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://www.bioperl.org/wiki/Mailing_lists - About the mailing lists
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:
http://bugzilla.bioperl.org/
SEE ALSO
Deobfuscator, deob_interface.cgi, deob_detail.cgi
AUTHOR
Dave Messina <dave-pause@davemessina.net>
CONTRIBUTORS
ACKNOWLEDGMENTS
This software was developed originally at the Cold Spring Harbor Laboratory's Advanced Bioinformatics Course between Oct 12-25, 2005. Many thanks to David Curiel, who provided much-needed guidance and assistance on this project.
LICENSE AND COPYRIGHT
Copyright (C) 2005-6 Laura Kavanaugh and Dave Messina. All Rights Reserved.
This module is free software; you may redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER
This software is provided "as is" without warranty of any kind.