NAME
Statistics::RankCorrelation - Compute the rank correlation between two vectors
SYNOPSIS
use Statistics::RankCorrelation;
$c = Statistics::RankCorrelation->new(\@u, \@v);
$n = $c->spearman;
$n = $c->csim;
DESCRIPTION
This module computes the rank correlation coefficient between two sample vectors.
Some definitions are always in order:
Statistical rank: The ordinal number of a value's position in a list sorted in a specified order (usually decreasing).
Tied ranks:
PUBLIC METHODS
new VECTOR1, VECTOR2
$c = Statistics::RankCorrelation->new(\@u, \@v);
This method constructs a new Statistics::RankCorrelation
object, "co-normalizes" (i.e. pad with trailing zero values) the vectors if they are not the same size, and finds their statistical ranks.
x_data, y_data
$x = $c->x_data;
$y = $c->y_data;
Return the original data samples that were provided to the constructor as array references.
x_rank, y_rank
$x = $c->x_rank;
$y = $c->y_rank;
Return the statistically ranked data samples that were provided to the constructor as array references.
spearman
$n = $c->spearman;
Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values. The formula is:
6 * sum( (Ri - Si)^2 )
1 - ----------------------
N * (N^2 - 1)
Where Ri and Si are the ranks of the values of the two data vectors, and N is the number of samples in the vectors.
The Spearman correlation is a special case of the Pearson product-moment correlation.
csim
$n = $c->csim;
Return the "contour similarity index measure", which is a single dimensional measure of the similarity between two vectors.
This returns a measure in the range [-1..1] and is computed using matrices of binary data representing "higher or lower" values in the original vectors.
Please consult the csim
item under the SEE ALSO
section.
PRIVATE FUNCTIONS
_rank
$u_ranks = _rank(\@u);
Return an array reference of the ordinal ranks of the given data.
In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:
sorted data: [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ]
ranks: [ 1, 2, 3, 4 5, 6 ]
tied ranks: 3, 4, and 5
tied average: (3 + 4 + 5) / 3 == 12 / 3 == 4
averaged ranks: [ 1, 2, 4, 4, 4, 6 ]
_pad_vectors
($u, $v) = _pad_vectors($u, $v);
Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.
_correlation_matrix
$matrix = _correlation_matrix($u);
Return the correlation matrix for a single vector.
This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.
SEE ALSO
For the csim
method:
http://www2.mdanderson.org/app/ilya/Publications/JNMRcontour.pdf
For the spearman
method:
http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html
http://faculty.vassar.edu/lowry/ch3b.html
http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606801.asp
http://fonsg3.let.uva.nl/Service/Statistics/RankCorrelation_coefficient.html
http://www.statsoftinc.com/textbook/stnonpar.html#correlations
http://software.biostat.washington.edu/~rossini/courses/intro-nonpar/text/Tied_Data.html#SECTION00427000000000000000
AUTHOR
Gene Boggs <gene@cpan.org>
COPYRIGHT AND LICENSE
Copyright 2003, Gene Boggs
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.