NAME
ompa-pa.pl - Extract seqs from BLAST/HMMER interactively or in batch mode
VERSION
version 0.252040
USAGE
ompa-pa.pl <infiles> --database=<file> [optional arguments]
REQUIRED ARGUMENTS
OPTIONAL ARGUMENTS
- --report-type=<str>
-
Type of the reports used as infiles [default: blastxml]. Currently, the following types are available:
- blastxml (XML BLAST reports generated with -outfmt 5) - hmmertbl (tabular HMMER reports generated with -domtblout)
- --database=<file>
-
Path to the sequence database used to generate the reports. For efficiency, this argument must always be the basename of a BLAST database, even when the reports where obtained using
hmmsearch
on a FASTA file.To build such a database, use one of the following commands:
$ makeblastdb -in database.fasta -out database -dbtype prot -parse_seqids $ makeblastdb -in database.fasta -out database -dbtype nucl -parse_seqids
This argument is required when the option
--extract-seqs
is enabled. - --skip-config=<file>
-
Path to an optional configuration file specifying the reports to skip based on their raw taxonomic content [default: none]. The assessment is made before any filtering other than
--max-hits
.The configuration file follows the classifier format (often YAML) of <classify-ali.pl>. This requires enabling taxonomic annotation and thus a local mirror of the NCBI Taxonomy database
- --colorize=<scheme>
-
When specified, sequence points are colored after their taxon using the specified CLS file. As above, this requires enabling taxonomic annotation and thus a local mirror of the NCBI Taxonomy database.
- --taxdir=<dir>
-
Path to local mirror of the NCBI Taxonomy database.
To build such a directory, use the following command:
$ setup-taxdir.pl --taxdir=taxdir
- --max-hits=<n>
-
Maximum number of hits to read from the report [default: 200000]. This limit is implemented for efficiency. It applies before any other filter.
- --min-cov=<n>
-
Minimum BLAST query or HMMER model coverage for selected hits [default: 0.7].
- --max-copy=<n>
-
Maximum gene copy number per organism for selected hits [default: 3].
- --extract-seqs
-
Sequence extraction switch [default: no]. When specified, selected sequences are stored into a FASTA file using the same basename as other output files. This requires a BLAST database (see option
--database
above). - --extract-tax
-
Taxonomy extraction switch [default: no]. When specified, NCBI taxons of selected sequences are stored into a file using the same basename as other output files. This requires a local mirror of the NCBI Taxonomy database.
- --restore-params-from=<file>
-
Batch-mode switch [default: no]. When specified, parameters are restored from the user-specified JSON file. This option takes precedence on any command-line specified option, such as
--max-hits
,--min-cov
and--max-copy
. - --restore-last-params
-
Batch-mode switch [default: no]. When specified, parameters are restored from the last saved JSON file for each report. This option takes precedence over all other command-line options.
- --print-plots
-
When specified, plots are printed in PDF format [default: no].
- --gnuplot-term=<str>
-
gnuplot terminal to use for the interactive mode [default: x11]. Other possible choices include qt but the option is open to experiment. On macOS, to avoid the font warning, use
--gnuplot-term='qt font "Arial"'
.If needed the gnuplot executable can be specified through the environment variable
OUM_GNUPLOT_EXEC
. - --version
- --usage
- --help
- --man
-
Print the usual program information
AUTHOR
Denis BAURAIN <denis.baurain@uliege.be>
CONTRIBUTOR
Amandine BERTRAND <amandine.bertrand@doct.uliege.be>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.