NAME

ompa-pa.pl - Extract seqs from BLAST/HMMER interactively or in batch mode

VERSION

version 0.252040

USAGE

ompa-pa.pl <infiles> --database=<file> [optional arguments]

REQUIRED ARGUMENTS

<infiles>

Path to input BLAST/HMMER report files [repeatable argument].

OPTIONAL ARGUMENTS

--report-type=<str>

Type of the reports used as infiles [default: blastxml]. Currently, the following types are available:

- blastxml (XML BLAST reports generated with -outfmt 5)
- hmmertbl (tabular HMMER reports generated with -domtblout)
--database=<file>

Path to the sequence database used to generate the reports. For efficiency, this argument must always be the basename of a BLAST database, even when the reports where obtained using hmmsearch on a FASTA file.

To build such a database, use one of the following commands:

$ makeblastdb -in database.fasta -out database -dbtype prot -parse_seqids
$ makeblastdb -in database.fasta -out database -dbtype nucl -parse_seqids

This argument is required when the option --extract-seqs is enabled.

--skip-config=<file>

Path to an optional configuration file specifying the reports to skip based on their raw taxonomic content [default: none]. The assessment is made before any filtering other than --max-hits.

The configuration file follows the classifier format (often YAML) of <classify-ali.pl>. This requires enabling taxonomic annotation and thus a local mirror of the NCBI Taxonomy database

--colorize=<scheme>

When specified, sequence points are colored after their taxon using the specified CLS file. As above, this requires enabling taxonomic annotation and thus a local mirror of the NCBI Taxonomy database.

--taxdir=<dir>

Path to local mirror of the NCBI Taxonomy database.

To build such a directory, use the following command:

$ setup-taxdir.pl --taxdir=taxdir
--max-hits=<n>

Maximum number of hits to read from the report [default: 200000]. This limit is implemented for efficiency. It applies before any other filter.

--min-cov=<n>

Minimum BLAST query or HMMER model coverage for selected hits [default: 0.7].

--max-copy=<n>

Maximum gene copy number per organism for selected hits [default: 3].

--extract-seqs

Sequence extraction switch [default: no]. When specified, selected sequences are stored into a FASTA file using the same basename as other output files. This requires a BLAST database (see option --database above).

--extract-tax

Taxonomy extraction switch [default: no]. When specified, NCBI taxons of selected sequences are stored into a file using the same basename as other output files. This requires a local mirror of the NCBI Taxonomy database.

--restore-params-from=<file>

Batch-mode switch [default: no]. When specified, parameters are restored from the user-specified JSON file. This option takes precedence on any command-line specified option, such as --max-hits, --min-cov and --max-copy.

--restore-last-params

Batch-mode switch [default: no]. When specified, parameters are restored from the last saved JSON file for each report. This option takes precedence over all other command-line options.

--print-plots

When specified, plots are printed in PDF format [default: no].

--gnuplot-term=<str>

gnuplot terminal to use for the interactive mode [default: x11]. Other possible choices include qt but the option is open to experiment. On macOS, to avoid the font warning, use --gnuplot-term='qt font "Arial"'.

If needed the gnuplot executable can be specified through the environment variable OUM_GNUPLOT_EXEC.

--version
--usage
--help
--man

Print the usual program information

AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

CONTRIBUTOR

Amandine BERTRAND <amandine.bertrand@doct.uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.