NAME
ClusterRankSmotifs
VERSION
Version 0.01
SYNOPSIS
Script to rank smotifs in the database by their loop signature and chemical shift difference as compared to a query smotif. Two parallel clustering and ranking methods are used: (a) cluster on the go and rank based on population (b) Joe's clusters and rank using diversity.
INPUT ARGUMENTS 1) $pdbcode : 4-letter name of the folder where the experimental chemical shift data is stored 2) $smotif : smotif number in the pdb
INPUT FILES In the <pdbcode> folder: 1) shiftcands<pdbcode><motnum>_<looplength><smotif type>.csv : Files containing results of comparing the query smotifs against the database. Each file (corresponding to each smotif) includes the number of residues compared, the chemical shift difference value, the RMSD (if structure is included), the loop length, the smotif NID, the secondary structure RMSD, secondary structure lengths, and loop structural signatures for the query and database motif and their overlap.
OUTPUT FILES In the <pdbcode> folder: 1) <pdbcode>_motifs_best_XX.csv : File containing a list of smotif candidates for each query smotif in the unknown protein 2) <pdbcode>_motifs_rmsd_XX.csv : File containing rmsds of smotif candidates for each query smotif in the unknown protein. XX = Query Smotif number.
Usage:
use ClusterRankSmotifs;
ClusterRankSmotifs ($pdb,$smotif);EXPORT
A list of functions that can be exported. You can delete this section if you don't export anything, such as for a purely object-oriented module.
SUBROUTINES
	rank_smotifs
	findranks_by_cs_clustered
        get_cluster_on_the_go
	getseq
        checkseqblosum
        read_clusters
        get_clusters
        read_joe_clustersrank_smotifs
Subroutine to cluster and rank the smotifs from the library based on the 
chemical difference and phi/psi signature match between the library Smotif
and the query Smotif. die "rank_smotifs: no file like $nam*csv was found in $pdbcode"
    unless @found;
# Let's assume that just ONE file like $pdbcode/$nam*csv was found.
# $nam2  = 1aab/shiftcands1aab_01_8HH.csv
my $nam2 = $found[0];findranks_by_cs_clustered
	Subroutine to cluster and rank the smotifs from the library based on the 
        chemical difference and phi/psi signature match between the library Smotif
        and the query Smotif.getseq
Subroutine to get the loop sequence for a given smotif by reading through 
the <pdbcode>.out file
$filename = 1aab_01_8HH
$pdbcode  = 1aab
more 1aab/1aab.out
Name     Chain   Type  Start   Looplength  SS1length SS2length   Sequence
1aab.pdb A       HH    14      8           15        12          SYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERW
1aab.pdb A       HH    37      4           12        22          FSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMcheckseqblosum
Subroutine to find the per-residue BLOSUM62 score between two sequencesread_clusters
Subroutine to read smotif clusters obtained from get_cluster_on_the_goget_clusters
Subroutine to get clusters using Phylipremove_nid
It will read a two-column file with format likebrinda@everest test_brinda]$head /tmp/motifclusters Cluster4: nid_376468 Cluster6: nid_167076 Cluster7: nid_096416 nid_371611 Cluster8: nid_343341 Cluster24: nid_356687 nid_318838 nid_016570 nid_229923 nid_003768 nid_091937
nid_ will removed from the second columns and output will written 
to an output file with format like:brinda@everest test_brinda]$head /tmp/motifclusters0 Cluster4: 376468 Cluster6: 167076 Cluster7: 096416 371611 Cluster8: nid_343341 Cluster24: 356687 318838 016570 229923 003768 091937
# Cluster8: nid_343341 
# Cluster24: nid_356687 nid_318838 nid_016570 nid_229923 nid_003768 nid_091937 
# my $cluster= ($line =~ /(Cluster\d+:)\s+/)[0];
# If you are after a single match you use a scalar in 
# list context as the L-VALUE i.e
#
# my ($cluster) = $line =~ /(Cluster\d+:)\s+/;
#
# Note if you forget the ( ) around $scalar and you get a match 
# $scalar will contain the integer value 1 so don't forget the ( ). 
# The ( ) gets you list context which you need.read_joe_clusters
Subroutine to read Joe's Smotif cluster classification (from files)get_cluster_on_the_go
    Subroutine to obtain Smotif clusters from the top 200 Smotifs identified using 
    chemical shift difference. 
	Input: 
	1. 4-letter pdb code (directory where all files in the modeling pipeline are saved). 
	2. Smotif number of the query Smotif under consideration
	3. RMSD threshold for clustering Smotifs (default=2.0 A). 
	4. Array of library Smotifs sorted by chemical shift difference. 
	Output: 
	Array of upto 200 library Smotifs, ranked by a compound score obtained from
	cluster size and chemical shift difference
	AUTHOR
Fiserlab Members , <andras at fiserlab.org>
BUGS
Please report any bugs or feature requests to bug-. at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=.. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc ClusterRankSmotifsYou can also look for information at:
- RT: CPAN's request tracker (report bugs here) 
- AnnoCPAN: Annotated CPAN documentation 
- CPAN Ratings 
- Search CPAN 
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
Copyright 2015 Fiserlab Members .
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:
http://www.perlfoundation.org/artistic_license_2_0
Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.
If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.
This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.
Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.