proteins_to_sequences

proteins_to_sequences allows the user to look up the amino acid sequences corresponding to each of a set of proteins (represented as MD5 hash values) This command allows you to get back formatted fasta files.

Example:

proteins_to_sequences [arguments] < input > output

The standard input should be a tab-separated table (i.e., each line is a tab-separated set of fields). Normally, the last field in each line would contain the identifer. If another column contains the identifier use

-c N

where N is the column (from 1) that contains the subsystem.

This is a pipe command. The input is taken from the standard input, and the output is to the standard output.

Command-Line Options

-c Column

This is used only if the column containing the subsystem is not the last column.

-i InputFile [ use InputFile, rather than stdin ]
-fasta

This is used to request a fasta output file (dropping all of the other columns in the input lines). It defaults to outputing just a fasta entry.

-fc Columns [ construct comment for fasta from these columns ]

This is used to ask for "fasta comments" formed from one or more columns (comma-separated)

Output Format

The standard output is jsut a fasta file with the sequence. You can also get a tab-delimited file by using -fasta=0. The tab-delimited format consists of the input file with an extra column of sequence added.

Input lines that cannot be extended are written to stderr.