NAME

Statistics::Krippendorff - Calculate Krippendorff's alpha

VERSION

Version 0.01

SYNOPSIS

use experimental qw( signatures );
use Statistics::Krippendorff ();

my @units = ({coder1 => 1, coder2 => 1},
             {coder1 => 2, coder2 => 2, coder3 => 1},
             {coder2 => 3, coder3 => 2});
my $sk = 'Statistics::Krippendorff'->new(units => \@units);
my $alpha1 = $sk->alpha;
$sk->delta(\&Statistics::Krippendorff::delta_nominal);  # Same as default.
my $alpha2 = $sk->alpha;

my $ski = 'Statistics::Krippendorff'->new(
              units => [[1, 1], [2,2,1], [undef,3,2]],
              delta => sub ($, $v0, $v1) { ($v0 - $v1) ** 2 });
my $alpha_interval = $ski->alpha;

METHODS

new

my $sk = 'Statistics::Krippendorff'->new(
             units => \@units,
             delta => \&Statistics::Krippendorff::delta_nominal);

The constructor. It accepts the following named arguments:

units

An array reference of units. All units of analysis must be of the same type, but there are two possible types they all can have:

  1. Each unit is a hash reference of the form

    { coder1 => 'value1', coder3 => 'value2', ... }
  2. Each unit is an array reference of the form

    ['value1', undef, 'value2']

    where the coder is encoded by the position in the array, missing data are indicated by an undef.

In both the cases, there must be at least two values in each unit. If you want to validate this precondition, call is_valid.

delta

An optional argument defaulting to delta_nominal. You can specify any function f($self, $v1, $v2) that compares the two values $v1 and $v2 and returns their distance (a number between 0 and 1). Several common methods are predefined:

delta_nominal

Used for nominal data, i.e. labels with no ordering.

delta_ordinal

Used for numeric values that are ordered, but can't be used in mathematical operations, for example number of stars in a movie rating system (we don't say that the distance from one star to two stars is the same as the distance from three starts to four stars). See the implementation on why $self is needed as a parameter to delta.

delta_interval

Used for numeric values that can be used in mathematical operations.

delta_ratio

Used for non-negative numeric values (think degrees Kelvin).

delta_jaccard

This can be used when coders can specify more than one value. Join the values with commas; Jaccard index then uses the formula intersection_size / union_size. If you sort the values before joining them, the expected coincidence matrix is smaller and the algorithm runs faster, but the resulting coefficient should be the same.

alpha

my $alpha = $sk->alpha;

Returns Krippendorff's alpha.

delta

$sk->delta(sub($self, $v1, $v2) {});

The difference function used to calculate the alpha. You can specify it in the constructor (see above), but you can later change it so something else, too.

is_valid

print "OK" if $sk->is_valid;

Check that each unit has at least two responses. If you use a hash representation of a unit, the values must be always defined.

frequency

my $freq = $sk->frequency('val1');

Returns the frequency of the given value.

pairable_values

Returns the total number of all pairable values (i.e. the sum of all frequencies).

vals

Returns a sorted list of all the possible values.

AUTHOR

E. Choroba, <choroba at cpan.org>

BUGS

Please report any bugs or feature requests to https://github.com/choroba/statistics-krippendorff/issues, via e-mail to bug-statistics-krippendorff at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Krippendorff. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Statistics::Krippendorff

You can also look for information at:

ACKNOWLEDGEMENTS

Implementation inspired by Wikipedia, additional tests taken from https://www.infoamerica.org/documentos_pdf/kripen.pdf.

LICENSE AND COPYRIGHT

This software is Copyright (c) 2025 by E. Choroba.

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)