NAME
Data::Pareto - Computing Pareto sets in Perl
VERSION
Version 0.02
SYNOPSIS
use Data::Pareto;
# only first and third columns are used in comparison
# the others are simply descriptive
my $set = new Data::Pareto(columns => [0, 2]);
$set->add(
[ 5, "pareto", 10, 11 ],
[ 5, "dominated", 11, 9 ],
[ 4, "pareto2", 12, 12 ]
);
# this returns [ [ 5, "pareto", 10, 11 ], [ 4, "pareto2", 12, 12 ] ],
# the other one is dominated on selected columns
$set->get_pareto_ref;
DESCRIPTION
This module makes calculation of Pareto set. Given a set of vectors (i.e. arrays of simple scalars), Pareto set is all the vectors from the given set which are not dominated by any other vector of the set. A vector X
is said to be dominated by Y
, iff X[i] >= Y[i]
for all i
and X[i] > Y[i]
for at least one i
.
Pareto sets play an important role in multiobjective optimization, where each non-dominated (i.e. Pareto) vector describes objectives value of "optimal" solution to the given problem.
This module allows occurrence of duplicates in the set - this makes it rather a bag than a set, but is useful in practice (e.g. when we want to preserve two solutions giving the same objectives value, but structurally different). This assumption influences dominance definition given above: two duplicates never dominate each other and hence can be present in the Pareto set. This is controlled by duplicates
option passed to new(): if set to true
value, duplicates are allowed in Pareto set; otherwise, only the first found element of the subset of duplicated vectors is preserved in Pareto set.
The values are allowed to be invalid. The meaning of 'invalid' is 'the worst possible'. It's different concept than 'unknown', which makes the definition of domination less clear.
FUNCTIONS
By default, a vector is passed around as a ref to array of consecutive column values. This means you shouldn't mess with it after passing to add
method.
new
Creates a new object for calculating Pareto set.
The argument passed is a hashref with options; the recognized options are:
columns
Arrayref containing column numbers which should be used for determining domination and duplication. Column numbers are
0
-based array indexes to data vectors.Only values at those positions will be ever compared between vectors. Any other data in the vectors may be present and is not used in any way.
At least one column number should be passed, for obvious reasons.
duplicates
If set to
true
value, duplicated vectors are all put in Pareto set (if they are Pareto, of course). If set tofalse
, duplicates of vectors already in the Pareto set are discarded.invalid
The value considered invalid in pareto set. Such value is dominated by any value and dominates only invalid value.
However, computations of domination in presence of invalid values can be considerably slower, as much as 5 times. So it probably will be faster to first parse the data and replace invalid markers with some huge-and-surely-dominated values.
add
Tests vectors passed as arguments and adds the non-dominated ones to the Pareto set.
get_pareto
Returns the current content of Pareto set as a list of vectors.
get_pareto_ref
Returns the current content of Pareto set as a ref to array with vectors. The return value references the original array, so treat it as read-only!
is_dominated
Checks if the first vector passed is dominated by the second one. The comparison is made based on the values in vectors' columns, which were passed to new().
The vectors passed are never duplicates of each other when this method is called from inside this module.
Returns true
, when the first vector from arguments list dominates the other and false
otherwise.
is_invalid
Checks if the given value is considered invalid for the current object. Every value is valid by default.
TODO
For large data sets calculations become time-intensive. There are a couple of techniques which might be applied to improve the performance:
defer the phase of removing vectors dominated by newly added vectors to get_pareto() call; this results in smaller number of arrays rewritings.
split the set of vectors being added into smaller subsets, calculate Pareto sets for such subsets, and then apply insertion of resulting Pareto subsets to the main set; this results in smaller number of useless tries of adding dominated vectors into the set.
AUTHOR
Przemyslaw Wesolek, <jest at go.art.pl>
BUGS
Please report any bugs or feature requests to bug-data-pareto at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Pareto. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Data::Pareto
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2009 Przemyslaw Wesolek, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.