NAME
Statistics::RankCorrelation - Compute the rank correlation between two vectors
SYNOPSIS
use Statistics::RankCorrelation;
$x = [ 8, 7, 6, 5, 4, 3, 2, 1 ];
$y = [ 2, 1, 5, 3, 4, 7, 8, 6 ];
$c = Statistics::RankCorrelation->new( $x, $y );
$n = $c->spearman;
$m = $c->csim;
DESCRIPTION
This module computes rank correlation coefficient measures between two sample vectors.
Working examples may be found in the distribution eg
directory and the module test file.
Also the HANDY FUNCTIONS
section below has some ..handy functions to use when computing sorted rank cooefficients by hand.
PUBLIC METHODS
new VECTOR1, VECTOR2
$c = Statistics::RankCorrelation->new( \@u, \@v );
This method constructs a new Statistics::RankCorrelation
object with two vectors.
The object is initialized by computing the statistical ranks of the vectors. If they are of different cardinality the shorter vector is first padded with trailing zeros.
x_data, y_data
$x = $c->x_data;
$y = $c->y_data;
Return the original data samples that were provided to the constructor as array references.
x_rank, y_rank
$x = $c->x_rank;
$y = $c->y_rank;
Return the statistically ranked data samples as array references.
spearman
$n = $c->spearman;
Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.
The formula is:
6 * sum( ( Xi - Yi ) ^ 2 )
1 - --------------------------
N * ( N ^ 2 - 1 )
Where X
and Y
are the two rank vectors and i
is an index from one to the N
number of samples.
In other words Xi
is the statistical rank of the value in the ith
position of the original X
data vector.
csim
$n = $c->csim;
Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.
This returns a measure in the range [-1..1]
and is computed using matrices of binary data representing "higher or lower" values in the original vectors.
This measure has been studied in musical contour analysis.
HANDY FUNCTIONS
rank
$ranks = rank( [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ] );
# [1, 2, 4, 4, 4, 6]
Return an array reference of the ordinal ranks of the given data.
Note that the data must be sorted as measurement pairs prior to computing the statistical rank. This is done automatically by the object initialization method.
In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:
sorted data: [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ]
ranks: [ 1, 2, 3, 4 5, 6 ]
tied ranks: 3, 4, and 5
tied average: (3 + 4 + 5) / 3 == 4
averaged ranks: [ 1, 2, 4, 4, 4, 6 ]
pair_sort
( $x, $y ) = pair_sort( [ 3, 5, 1, 1, 4 ], [ 9, 6, 3, 0, 9 ] );
# [1, 1, 3, 4, 5], [0, 3, 9, 9, 6]
Sort two given vectors as measurement pairs in numerical ascending order by the first (x) array.
pad_vectors
( $u, $v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] );
# [1, 2, 3, 4], [9, 8, 0, 0]
Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.
correlation_matrix
$matrix = correlation_matrix( $u );
Return the correlation matrix for a single vector.
This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.
TO DO
Implement other rank correlation measures that are out there.
SEE ALSO
For the csim
method:
http://www2.mdanderson.org/app/ilya/Publications/JNMRcontour.pdf
For the spearman
method:
http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html
http://faculty.vassar.edu/lowry/ch3b.html
http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606801.asp
http://fonsg3.let.uva.nl/Service/Statistics/RankCorrelation_coefficient.html
http://www.statsoftinc.com/textbook/stnonpar.html#correlations
http://www.analytics.washington.edu/~rossini/courses/intro-nonpar/text/Tied_Data.html
http://www.analytics.washington.edu/~rossini/courses/intro-nonpar/text/Spearman_s_tex2html_image_mark_tex2html_wrap_inline4049_.html
AUTHOR
Gene Boggs <gene@cpan.org>
COPYRIGHT
Copyright 2003, Gene Boggs
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.