The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

abbr-ids-fas.pl - Abbreviate (standardize) seq ids in FASTA files

VERSION

version 0.210170

USAGE

    abbr-ids-fas.pl <infiles> [optional arguments]

REQUIRED ARGUMENTS

<infiles>

Path to input FASTA files [repeatable argument].

OPTIONAL ARGUMENTS

--out[-suffix]=<suffix>

Suffix to append to infile basenames for deriving outfile names [default: none]. When not specified, outfile names are taken from infiles but original infiles are preserved by being appended a .bak suffix.

--store-id-mapper

Store the IDM file corresponding to each output file [default: no].

--id-prefix-mapper=<file>

Path to an optional IDM file explicitly listing the infile => prefix pairs. Useful in the context of processing multiple input files. This argument and the next one (--id-prefix) can be both specified together. In such a case, however, a single pipe char is appended to the combined prefix.

--id-prefix=<str>

String to use as the seq id prefix (e.g., NCBI taxon id, 4-letter code) [default: none].

--id-regex=<str>

Regular expression for capturing the original seq id [default: none]. When both are specified, this argument takes precedence on the next one (--ids-from-acc).

The argument value can be either a predefined regex or a custom regex given on the command line (do not forget to escape the special chars then). The following predefined regexes are available (assuming a leading '>'):

    - :DEF (first stretch of non-whitespace chars)
    - :GI  (number nnn in  gi|nnn|...)
    - :GNL (string xxx in gnl|yyy|xxx)
    - :JGI (number nnn in jgi|xxx|nnn or jgi|xxx|nnn|yyy)
    - :PAC (number nnn in xxx|PACid:nnn)
--ids-from-acc

Use MUST accessions or gi numbers (after the @ char) as abbr seq ids [default: no]. When neither this argument nor the preceding one (--id-regex) is specified, abbr seq ids will be of the form seq1, seq2 etc.

--version
--usage
--help
--man

Print the usual program information

AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2021 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.