NAME
Lingua::Diversity::Internals - utility subroutines for classes derived from Lingua::Diversity
VERSION
This documentation refers to Lingua::Diversity::Internals version 0.03.
SYNOPSIS
use Lingua::Diversity::Internals qw( :all );
# NB: the following subroutine calls are meant to illustrate the various
# possibilities of this module -- the order in which they appear here
# is not meaningful. Furthermore, it is assumed here that a number of
# variables ($array_ref, $unit_array_ref, etc.) have been defined.
# Get a random subsample of 20 items taken from an array...
my $sampled_indices_ref = _sample_indices(
scalar( @original_array ),
20,
)
my @subsample = @original_array[@$sampled_indices_ref];
# Get the average, variance and count of a list of numbers...
my ( $average, $variance, $count ) = _get_average( \@numbers );
# Get the weighted average, variance and count of a list of numbers...
my( $average, $variance, $count ) = _get_average( \@numbers, \@weights );
# Get the number of types (distinct items) in an array...
my $number_of_types = _count_types( $array_ref );
# Get the frequency of types in an array...
my $freq_hash_ref = _count_frequency( $array_ref );
foreach my $item ( sort keys %$freq_hash_ref ) {
print $item, "\t", $freq_hash_ref->{$item}, "\n";
}
# Get the list of unit types associated to each category type...
my $units_in_category_hash_ref = _get_units_per_category(
$unit_array_ref,
$category_array_ref,
);
foreach my $category ( sort keys %$units_in_category_hash_ref ) {
print $category, "\t",
join( q{,}, $units_in_category_hash_ref->{$category} ), "\n";
}
# Get the perplexity of items in an array...
my $perplexity = _perplexity( $array_ref );
# Get the shannon entropy of items in an array...
my $shannon_entropy = _shannon_entropy( $array_ref );
# Get the Renyi entropy of items in an array...
my $renyi_entropy = _renyi_entropy(
'array_ref' => $array_ref,
'exponent' => 0.7,
);
DESCRIPTION
This module provides utility subroutines that are or could be used by various classes derived from Lingua::Diversity. These subroutines are marked as internal (i.e. their name starts with an underscore) because they are meant to be used by developers creating classes derived from Lingua::Diversity (as opposed to being used by clients of such classes).
No subroutine is exported by default. All subroutines are exportable, and tag ':all' results in the export of all subroutines.
SUBROUTINES
- _sample_indices()
-
Return a reference to an array of random array indices. The subroutine takes two arguments, namely the size of the array (i.e. 1 plus the maximum possible index) and the number of indices to be sampled. An exception is thrown if the latter exceeds the former.
- _get_average()
-
Compute the (possibly weighted) average and variance of a list of numbers. Return the average, variance, and count (number of observations).
The subroutine requires a reference to an array of numbers as argument. Passing an empty array throws an exception.
Optionally, a reference to an array of counts may be passed as a second argument. An exception is thrown if this array's size does not match the first one. Counts may be real instead of integers, in which case the number of observations returned may not be an integer. In all cases, reported results are weighted according to the counts.
- _count_types()
-
Count the number of distinct items in an array. Takes an array reference as argument.
- _count_frequency()
-
Count the number of occurrences of each distinct item in an array. Takes an array reference as argument. The result is a reference to a hash where each key correspond to a distinct item and each value to the number of occurrences of this item in the array.
- _get_units_per_category()
-
Take a reference to an array of units and an array of categories, and build a hash where each key is a category and the corresponding value is a reference to the list of units that are associated with this category.
NB: It is assumed that two non-empty arrays of identical size are passed in argument.
- _perplexity()
-
Compute the perplexity of items in an array, i.e. the exponential of the Shannon entropy of items in base e (see below). Takes a reference to an array as argument.
- _shannon_entropy()
-
Compute the Shannon entropy of items in an array. Takes a reference to an array as first argument, and optionally the requested log base for the computation (default is e, i.e. exp(1)).
NB: It is assumed that a non-empty array is passed in argument.
- _renyi_entropy()
-
Compute the Renyi entropy of items in an array. Takes one required and two optional named parameters:
- array_ref
-
A reference to a non-empty array.
- exponent
-
The numeric parameter involved in the computation of Renyi's entropy (a number between 0 and 1 inclusive). Note that 0 amounts to computing the log of the number of types, and 1 amounts to computing Shannon's entropy. Default is 0.5.
- base
-
A positive number to be used as the log base in the computation. Default is e (i.e. exp(1)).
DIAGNOSTICS
- The second argument of subroutine sampled_indices() cannot be larger than the first
-
This exception is raised when the second argument of subroutine
sampled_indices()
is larger than the first, i.e. when the requested sample size exceeds the array size. - Parameter 'exponent' of subroutine _renyi_entropy() must be between 0 and 1 inclusive
-
This exception is raised when the parameter exponent of subroutine
_renyi_entropy()
is set to a value lesser than 0 or greater than 1.
DEPENDENCIES
This module is part of the Lingua::Diversity distribution.
BUGS AND LIMITATIONS
There are no known bugs in this module.
Please report problems to Aris Xanthos (aris.xanthos@unil.ch)
Patches are welcome.
AUTHOR
Aris Xanthos (aris.xanthos@unil.ch)
LICENSE AND COPYRIGHT
Copyright (c) 2011 Aris Xanthos (aris.xanthos@unil.ch).
This program is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.