protein_families_to_co_occurring_families
Since we accumulate data relating to the co-occurrence (i.e., chromosomal clustering) of genes in prokaryotic genomes, we can note which pairs of genes tend to co-occur. From this data, one can compute the protein families that tend to co-occur (i.e., tend to cluster on the chromosome). This allows one to formulate conjectures for unclustered pairs, based on clustered pairs from the same protein_families.
Example:
protein_families_to_co_occurring_families [arguments] < input > output
The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the identifer. If another column contains the identifier use
-c N
where N is the column (from 1) that contains the subsystem.
This is a pipe command. The input is taken from the standard input, and the output is to the standard output.
Documentation for underlying call
This script is a wrapper for the CDMI-API call protein_families_to_co_occurring_families. It is documented as follows:
$return = $obj->protein_families_to_co_occurring_families($protein_families)
- Parameter and return types
-
$protein_families is a protein_families $return is a reference to a hash where the key is a protein_family and the value is a fc_protein_families protein_families is a reference to a list where each element is a protein_family protein_family is a string fc_protein_families is a reference to a list where each element is a fc_protein_family fc_protein_family is a reference to a list containing 3 items: 0: a protein_family 1: a score 2: a function score is a float function is a string
Command-Line Options
- -c Column
-
This is used only if the column containing the subsystem is not the last column.
- -i InputFile [ use InputFile, rather than stdin ]
Output Format
The standard output is a tab-delimited file. It consists of the input file with extra columns added.
Input lines that cannot be extended are written to stderr.