NAME
fasta-decoy.pl - decoy input databanks following several moethods
DESCRIPTION
Reads input fasta file and produce a decoyed databank with several methods:
- reverse: simply reverse each sequence
- shuffle: shuffle AA in each sequence
- shuffle & avoid known cleaved peptides: shuffle sequence but avoid producing known tryptic peptides
- Markov model: learn Markov model chain distribution of a given level, then produces entries corresponding to this distribution
SYNOPSIS
#reverse sequences for a local (optionaly compressed) file
fasta-decoy.pl --in=/tmp/uniprot_sprot.fasta.gz --method=reverse
#download databanks from the web | uncompress it and shuffle the sequence
wget -silent -O - ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz | zcat | databatanks-decoy.pl --method=shuffle
#use a .dat file (with splice forms) as an input
uniprotdat2fasta.pl --in=uniprot_sprot_human.dat | fasta-decoy.pl --method=markovmodel
#reversing each sequence
fasta-decoy.pl --ac-prefix=DECOY_ --in=mitoch.fasta --method=reverse --out=mitoch-reverse.fasta
#drawing amino acid following distribution in original fasta (end of sequence is considered as a learned random event)
fasta-decoy.pl --ac-prefix=DECOY_ --in=mitoch.fasta --method=markovmodel --markovmodel-level=0 --out=mitoch-markovmodel_0.fasta
#drawing amino acid with a markov model (here of length 3)
fasta-decoy.pl --ac-prefix=DECOY_ --in=mitoch.fasta --method=markovmodel --markovmodel-level=3 --out=mitoch-markovmodel_3.fasta
#each sequence is randomly shuffled
fasta-decoy.pl --ac-prefix=DECOY_ --in=mitoch.fasta --method=shuffle --out=mitoch-shuffle.fasta
#idem, but no tryptic peptide (of length>=6) from the original bank must be found in the random one;
#crc is the number of bit attributed to building the index mechanism, to avoid memory overallocation
#crc=33 means 1GB of ram will be used to store the index, 32 means 512MB, 34 means 2GB etc..)
fasta-decoy.pl --ac-prefix=DECOY_ --in=mitoch.fasta --method=shuffle --shuffle-reshufflecleavedpeptides-crc=33 --shuffle-reshufflecleavedpeptides --out=mitoch-shuffleplus.fasta
ARGUMENTS
--in=infile.fasta
An input fasta file (will be uncompressed if ending with gz)
-out=outfile.fasta
A .fasta file [default is stdout]
--method=(reverse|shuffle|markovmodel)
Set the decoying method
OPTIONS
--ac-prefix=string
Set a key to be prepended before the AC in the randomized bank. By default, it will be dependent on the choosen method.
--method=shuffle options
--shuffle-reshufflecleavedpeptides
Re-shuffle peptides of size >=6 that where detected as cleaved one in original databank
--shuffle-reshufflecleavedpeptides-minlength [default 6]
Set the size of the peptide to be reshuffled if they already exist
--shuffle-reshufflecleavedpeptides-crc=int
Building a hash of known cleaved peptide can be quite demanding for memory (uniprot_trembl => ~4GB). Therefore solution is to make an array containing statements if or not a peptide with corresponding crc code was found.
--shuffle-cleaveenzyme=regexp
Set a regular expression for the enzyme [default is trypsin: '.*?[KR](?=[^P])']
--shuffle-testenzyme
Just digest entries with the set enzyme and produces space separated peptides (to check the enzyme)
--method=markovmodel options
--markovmodel-level=int [default 3]
Set length of the model (0 means only AA distrbution will be respected, 3 means chains of length 3 distribution etc.). Setting a length >3 can deal to memory burnout.
misc
--noprogressbar
do not display terminal progress bar (if possible)
--help
--man
--verbose
Setting an environment variable DO_NOT_DELETE_TEMP=1 will keep the temporay file after the script exit
EXAMPLE
COPYRIGHT
Copyright (C) 2004-2006 Geneva Bioinformatics www.genebio.com
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
AUTHORS
Alexandre Masselot, www.genebio.com