NAME
Mecom - A Perl module for protein contact interfaces evolutive analysis
VERSION
Version 1.11
SYNOPSIS
# Create the object
my $coe = Mecom->new(
pdb => 'pdb/files/path/2occ.pdb',
alignment => 'aln/files/path/chainM.aln',
chain => 'M',
);
# Run calcs
$coe->run;
# Write HTML Report
open REP, ">report.html";
print REP $coe->run_report;
close REP;
DESCRIPTION
This module integrates a workflow aimed to address the evolvability of the contact interfaces within a protein complex. The method Mecom->run
launchs the whole analysis. Also, such workflow is divided into the following steps:
- Step 1, Structural analysis:
Mecom->run_struct
- Step 2, Sub-sets filtering:
Mecom->run_filtering
- Step 3, Sub-alignments building:
Mecom->run_subalign
- Step 4, Evolutionary calcs:
Mecom->run_yang
- Step 5, Statistical analysis:
Mecom->run_stats1
A detailed explanation about these methods is reported below.
REQUERIMENTS
CONSTRUCTOR
new()
$obj = Mecom->new(%input_data);
The new class method construct a new Mecom object. The returned object can be used to perform several evolutive analysis. new
accepts the following parameters in an input hash as above used %input_data
:
pdbfilepath (required if contactfile is missing)
A valid pdb file path to be opened for reading.
contactfilepath (required if pdb is missing)
A valid contact file path. This file must contain the structural information retrieved by a previous analysis on the same chain
alignfilepath (required)
A valid DNA multiple alignment file path. The alignment must correspond with the specified chain and must be at least as long as the pdb chain (x3)
chain (required)
A given subunits within the studied complex
pth (default 4 Angstroms)
Proximity threshold. The maximun distance between two residues to be considered as a contact pair
sth (default 0.05)
Exposure threshold. The maximun exposure fraction to be considered as a buried residue.
sthmargin (default 0)
An error margin for sth. For instance: if is set to 0.01, residues with exposure higher than 0.06 will be considered as exposed, those with exposure lower than 0.04 will be buried and those residues with exposure between 0.04 and 0.06 will not be considered
contactwith
A string with valid chain identificators separated by commas:
$contactwith = "A,B,D";
if it is set, the program will only consider as contact residues those in close proximity with the specified chains. The others will be excluded.
informat (default fasta)
Specify the format of the input alignment file. Supported formats include fasta, genbank, embl, swiss (SwissProt), Entrez Gene and tracefile formats such as abi (ABI) and scf. There are many more, for a complete listing see the SeqIO HOWTO (http://bioperl.open-bio.org/wiki/HOWTO:SeqIO).
If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error.
The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid.
Currently, the tracefile formats (except for SCF) require installation of the external Staden "io_lib" package, as well as the Bio::SeqIO::staden::read package available from the bioperl-ext repository.
oformat (default clustalw)
Specify the format of the output sub-alignments. As above.
gc (default 0)
The genetic code. The attribute must be one of the following integers, which correspond with the indicated genetic code:
0: Standar 1: Mammailan mitochondrial 2: Yeast mitochondrial 3: Mold mitochondiral 4: Invertebrate mitochondrial 5: Ciliate nuclear 6: Echinoderm mitochondrial 7: Euplotid mitochondrial 8: Alternative yeast nuclear 9: Ascidian mitochondrial 10: Blepharisma nuclear
These codes correspond to transl_table 1 to 11 of GENEBANK
ocontact (default ocontact)
A valid file path to write the structural results
dsspbin (default dssp)
The path to the DSSP binary
MAIN METHODS
run()
Title : run
Usage : $obj->run
Function: Launch the whole workflow analysis
Returns :
Args :
run_struct()
Title : run_struct
Usage : $obj->run_struct
Function: Launch structural analysis and stores the result in the attribute:
"structdata"
Returns : True if success
Args :
run_filtering()
Title : run_struct
Usage : $obj->run_filtering
Function: Build different categories of sets (Contact, NonContact ...)
and set the attribute "lists" with the result
Returns : True if success
Args :
run_subalign()
Title : run_subalign
Usage : $obj->run_subalign
Function: Build new alignments from the input chain alignment and the categories
built by run_filtering method. Stores the result into "subalns" attribute
Returns : True if success
Args :
run_yang()
Title : run_yang
Usage : $obj->run_yang
Function: Launch PAML for each alignment stored at "sub_alns" attribute and
store the results into "paml_res"
Returns : True if success
Args :
run_stats1()
Title : run_stats1
Usage : $obj->run_stats1
Function: Run a Z-Test with the obtained evolutionary data and store the
results into "stats" attribute
Returns : True if success
Args :
run_report()
Title : run_report
Usage : $obj->run_report
Function: Write a HTML report
Returns : [String] HTML report with the results and input data
Args :
AUXILIAR METHODS
cat_aln()
Title : run_report
Usage : $obj->cat_aln(@alns)
Function: Concatenates alignment objects. Sequences are identified by id.
An error will be thrown if the sequence ids are not unique in the
first alignment. If any ids are not present or not unique in any
of the additional alignments then those sequences are omitted from
the concatenated alignment, and a warning is issued. An error will
be thrown if any of the alignments are not flush, since
concatenating such alignments is unlikely to make biological
sense.
Returns : A unique Bio::SimpleAlign object
Args : A list of Bio::SimpleAlign objects
PROCESSED DATA STORAGE
Once each analysis has been performed, the resulting data is stored in other setable attributes:
structdata
[Array] A table with the structural information calculated by Mecom::Contact.pm and DSSP
lists
[Hash] Each item contains a list of number corresponding with each type of residue. The key for a given item is the name for the category.
Contact NonContact ExposedNonContact ContactWith_$specified_chains [...]
subalns
[Hash] Each item contains a sub-alignment for a given category (see above)
pamlres
[Hash] Results for evolutive analysis. Each item contains the results for a given sub-alignment (see above)
stats
[Hash] Statistical results
SECONDARY METHODS (but not less important)
All attributes are accesible and mutable from methods called get_attribute and set_attribute, respectively. For example:
# Set the proximity threshold ("pth") to 3 Angstroms
$obj->set_pth(3);
# Print the current value of the attribute "pth"
print $obj->get_pth;
The processed data is also stored in attributes. Thus, this kind of methods can also be used to access and modify the results.
AUTHOR - Hector Valverde
Hector Valverde, <hvalverde@uma.es>
CONTRIBUTORS
Juan Carlos Aledo, <caledo@uma.es>
BUGS
Please report any bugs or feature requests to bug-Mecom-Complex at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Mecom-Complex. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
This module is the program core of MECOM Perl program. Further information about this project is available at:
http://mecom.hval.es/
You can find documentation for this module with the UNIX man command.
man Mecom
LICENSE AND COPYRIGHT
Copyright 2013 Hector Valverde and Juan Carlos Aledo.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.