NAME

Data::Pareto - Computing Pareto sets in Perl

VERSION

Version 0.01

SYNOPSIS

use Data::Pareto;

# only first and third columns are used in comparison
# the others are simply descriptive
my $set = new Data::Pareto(columns => [0, 2]);
$set->add(
    [ 5, "pareto", 10, 11 ],
    [ 5, "dominated", 11, 9 ],
    [ 4, "pareto2", 12, 12 ] 
);

# this returns [ [ 5, "pareto", 10, 11 ], [ 4, "pareto2", 12, 12 ] ],
# the other one is dominated on selected columns
$set->get_pareto_ref;

DESCRIPTION

This module makes calculation of Pareto set. Given a set of vectors (i.e. arrays of simple scalars), Pareto set is all the vectors from the given set which are not dominated by any other vector of the set. A vector X is said to be dominated by Y, iff X[i] >= Y[i] for all i and X[i] > Y[i] for at least one i.

Pareto sets play an important role in multiobjective optimization, where each non-dominated (i.e. Pareto) vector describes objectives value of "optimal" solution to the given problem.

This module allows occurrence of duplicates in the set - this makes it rather a bag than a set, but is useful in practice (e.g. when we want to preserve two solutions giving the same objectives value, but structurally different). This assumption influences dominance definition given above: two duplicates never dominate each other and hence can be present in the Pareto set. This is controlled by duplicates option passed to new(): if set to true value, duplicates are allowed in Pareto set; otherwise, only the first found element of the subset of duplicated vectors is preserved in Pareto set.

FUNCTIONS

By default, a vector is passed around as a ref to array of consecutive column values. This means you shouldn't mess with it after passing to add method.

new

Creates a new object for calculating Pareto set.

The argument passed is a hashref with options; the recognized options are:

  • columns

    Arrayref containing column numbers which should be used for determining domination and duplication. Column numbers are 0-based array indexes to data vectors.

    Only values at those positions will be ever compared between vectors. Any other data in the vectors may be present and is not used in any way.

    At least one column number should be passed, for obvious reasons.

  • duplicates

    If set to true value, duplicated vectors are all put in Pareto set (if they are Pareto, of course). If set to false, duplicates of vectors already in the Pareto set are discarded.

add

Tests vectors passed as arguments and adds the non-dominated ones to the Pareto set.

get_pareto

Returns the current content of Pareto set as a list of vectors.

get_pareto_ref

Returns the current content of Pareto set as a ref to array with vectors. The return value references the original array, so treat it as read-only!

is_dominated

Returns true, if the first vector passed is dominated by the second one. The comparison is made based on the values in vectors' columns, which were passed to new().

The vectors passed are never duplicates of each other when this method is called from inside this module.

TODO

For large data sets calculations become time-intensive. There are a couple of techniques which might be applied to improve the performance:

  • defer the phase of removing vectors dominated by newly added vectors to get_pareto() call; this results in smaller number of arrays rewritings.

  • split the set of vectors being added into smaller subsets, calculate Pareto sets for such subsets, and then apply insertion of resulting Pareto subsets to the main set; this results in smaller number of useless tries of adding dominated vectors into the set.

AUTHOR

Przemyslaw Wesolek, <jest at go.art.pl>

BUGS

Please report any bugs or feature requests to bug-data-pareto at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Pareto. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Data::Pareto

You can also look for information at:

COPYRIGHT & LICENSE

Copyright 2009 Przemyslaw Wesolek, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.