NAME

split-rates-ali.pl - Split ALI files into subsets of sites based on site-wise statistics

VERSION

version 0.242020

USAGE

split-rates-ali.pl <infiles> [optional arguments]

REQUIRED ARGUMENTS

<infiles>

Path to input RATE(S) (or SITEFREQ) files [repeatable argument].

OPTIONAL ARGUMENTS

--in[-strip]=<str>

Substring(s) to strip from infile basenames before attempting to derive other infile (e.g., ALI files) and outfile names [default: none].

--out[-suffix]=<suffix>

Suffix to append to (possibly stripped) infile basenames for deriving outfile names [default: none]. When not specified, outfile names are taken from infiles but original infiles are preserved by being appended a .bak suffix.

--from-scafos

Consider the input ALI file as generated by SCaFoS [default: no]. Currently, specifying this option results in turning all ambiguous and missing character states to gaps.

--del-const

Delete constant sites just as the -dc option of PhyloBayes [default: no].

--phylip

Assume infiles and outfiles are in PHYLIP format (instead of ALI format) [default: no].

--sitefreq

Assume infile are IQ-TREE SITEFREQ files instead of RATE(S) files [default: no].

--other-rates=<file>

Optional additional RATE(S) (or SITEFREQ) file of the same length as the main infile(s) that will be used to compute rate deltas [default: none]. Currently, rate deltas are defined as the absolute difference between the two rates at each site. This could be improved by, e.g., computing relative deltas.

When SITEFREQ are provided instead of RATE(S), deltas correspond to chi-square test statistics computed between the two SITEFREQ files at each site.

--dump-stats

Output site-wise stats resulting from the comparison of a pair of infiles of the same length (see --other-rates option) [default: no]. The values are either delta rates or chi-square statistics dependending on the infiles type (see --sitefreq option).

--bin-number=<n>

Number of bins to define [default: 10].

--percentile

Define bins containing an equal number of sites rather than bins of equal width in terms of rates [default: no].

--cumulative

Define bins including all previous bins [default: no]. This leads to ALI outfiles of increasing width and only makes sense when slower sites are in lower bins. If higher "rates" mean slower sites, use the --descending option.

--descending

Reverse bin order to accommodate "rates" files where higher values actually correspond to slower sites, such as those produced by Carla Cummins' TIGER [default: no].

--version
--usage
--help
--man

Print the usual program information

AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.