NAME
RNAalisplit - Split and decompose RNA multiple sequence alignments
SYNOPSIS
RNAalisplit.pl [--aln|-a FILE] [--method|-m OPTION] [options]
DESCRIPTION
This tool splits multiple sequence alignments horizontally, thereby extracting sets of sequences that group together according to a decision value. The most natural decision value is the RNAz SVM RNA-classs probability.
A neighbour joining tree is reconstructed from pairwise distances of sequences in the input alignment and subsets of the alignment are derived by splitting at each edge of the NJ tree as well as performing a split decomposition of the matrix of pairwise distances. These subsets/subalignments are then evaluated according to the same decision value and a decision is made whether a subalignment performs better than the original alignment.This can be used to discriminate sequences that to not 'fit' in the input alignment.
Output is written to STDOUT and a directory containing all temporary RNAalifold / RNAz / R-scape output files is created. Inside this directory, the 'phylip.dst' file contains the distance matrix computed from pairwise distances. It can be visualized e.g. with SplitsTree.
OPTIONS
- --aln|-a
-
A multiple sequence alignment in ClustalW format
- --constraint|-c
-
Constraint structure, overriding the consensus structure of the underlying alignment in case --method|-m dBc is selected.
- --method|-m
-
Method to compute pairwise ditances. Available options are 'dHn', 'dHx', 'dBp', 'dBc, 'dHB', and 'SCI'. The first and second compute pairwise Hamming distances of sequences, where 'dHn' replaces gaps with 'N', whereas 'dHx' removes all gap columns (not yet implemented). 'dBp' folds RNA sequences into their MFE structures and computes pairwise base pair distances. 'dBc' computes base pair distances on constraint-folded RNA sequences. Here, the default is to use the consensus structure of the underlying alignment as a constraint, however, an alternative constraint structure can be provided via the --constraint option. 'SCI' computes the distance as 1-log(SCI), based on a truncated strucure conservation index of two sequences. The latter, however, is not a metric and therefore often results in negative branch lengths in Neighbor Joining trees. Use with caution. [default: 'dHn']
- --noribosum
-
Turn off ribosum scoring for RNAalifold computation. Default: ribosum scoring on
- --rscapestat
-
R-scape covariation statistic. Allowed values are: 'GT', 'MI', 'MIr', 'MIg', 'CHI', 'OMES', 'RAF', 'RAFS'. Appending either 'p' or 'a' to any of them calculates its average product correction and average sum correction, respctively (e.g. GTp or GTa). See the R-scape manual for details.
- --out|-o
-
Output base directory. Temporary data and results will be written to this directory
- --version
-
Show RNAalisplit version and exit
AUTHOR
Michael T. Wolfinger <michael@wolfinger.eu> and <michael.wolfinger@univie.ac.at>