NAME
fafind-eq-seq - find equal sequences
SYNOPSIS
./fafind-eq-seq [--help] [--eval 'perlcode'] <file1> [<file2> ... <fileN>] >file_with_results.txt
./fafind-eq-seq [--help] [--eval 'perlcode'] --filter <file1> [<file2> ... <fileN>] >file_with_only_unique_seqs.fasta
DESCRIPTION
Find identical / equal sequences in a given set of fasta files. Info messages go to standard error (stderr), results to standard output (stdout).
The result output of file_with_results.txt consists of lines following the pattern
<ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
<ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
<ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
<TAB><ID> <DESCRIPTION><TAB><FILE>
whereas each unindented line and the following <TAB>-indented lines mark one group of identical sequences.
OPTIONS
- --filter
-
Do not print the groups but the sequences in fasta format instead. Duplicated sequences are omitted. The resulting fasta output is not checked for identical ids, etc.
Synonyms: -f
- --help
-
Display this message.
Synonyms: -?, -h
- --eval
-
Manipulate input sequences on the fly. The current sequence string is set to
$_
.This doesn't change the actual output sequence, e.g. on filtering.
Can be very handy for comparing aa-sequences from two different files, at which one file uses * as stop codon and the other file not:
./fafind-eq-seq --eval 's/\*$//' <file1> <file2> >file_with_results.txt
Synonyms: -e
AUTHOR
jw bargsten, <joachim.bargsten at wur.nl>