proteins_to_sequences
proteins_to_sequences allows the user to look up the amino acid sequences corresponding to each of a set of proteins (represented as MD5 hash values) This command allows you to get back formatted fasta files.
Example:
proteins_to_sequences [arguments] < input > output
The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the identifer. If another column contains the identifier use
-c N
where N is the column (from 1) that contains the subsystem.
This is a pipe command. The input is taken from the standard input, and the output is to the standard output.
Command-Line Options
- -c Column
-
This is used only if the column containing the subsystem is not the last column.
- -i InputFile [ use InputFile, rather than stdin ]
- -fasta
-
This is used to request a fasta output file (dropping all of the other columns in the input lines). It defaults to outputing just a fasta entry.
- -fc Columns [ construct comment for fasta from these columns ]
-
This is used to ask for "fasta comments" formed from one or more columns (comma-separated)
Output Format
The standard output is jsut a fasta file with the sequence. You can also get a tab-delimited file by using -fasta=0. The tab-delimited format consists of the input file with an extra column of sequence added.
Input lines that cannot be extended are written to stderr.