NAME

Sort::DataTypes - Sort a list of data using methods relevant to the type of data

SYNOPSIS

use Sort::DataTypes qw(:all);

DESCRIPTION

This allows you to sort a list of data elements using methods that are relevant to the type of data it is. This modules does not attempt to be the fastest sorter on the block. If you are sorting thousands of elements and need a lot of speed, you should refer to a module specializing in the specific type of sort you will be doing. However, to do smaller sorts of different types of data, this is the module to use.

ROUTINES

All sort routines are named sort_METHOD where METHOD is the name of the method. All sort_METHOD have both a forward and reverse sort:

sort_METHOD(\@list,@args);
sort_rev_METHOD(\@list,@args);

where @args are any additional arguments needed for that sort method.

Corresponding to every sort_METHOD routine is a cmp_METHOD routine which takes two elements (and possibly additional arguments as required by the actual method) and returns a -1, 0, or 1 (similar to the cmp or <=> operators).

$flag = cmp_METHOD($x,$y,@args);
$flag = cmp_rev_METHOD($x,$y,@args);

All sort_METHOD functions can also be used to sort a list using a hash:

sort_METHOD(\@list,[@args],\%hash);
sort_rev_METHOD(\@list,[@args],\%hash);

In this case, elements of @list are used as keys in %hash. The values of the hash are compared using the cmp_METHOD function to sort the keys in @list.

For example, if %hash contains the key/value pairs:

foo => 3
bar => 5
ick => 1

and @list contains (foo,bar,ick), then sorting:

sort_numerical(\@list,%hash)
  => @list = (ick,foo,bar)

since "ick" corresponds to a numerical value of 1, "foo" to 3, and "bar" to 5.

sort_valid_method, cmp_valid_method
use Sort::DataTypes qw(:all)

$flag = sort_valid_method($string);
$flag = cmp_valid_method($string);

These are identical and return 1 if there is a valid sort method named $string in the module. For example, there is a function "sort_numerical" defined in this modules, but there is no function "sort_foobar", so the following would occur:

sort_valid_method("numerical")
   => 1

sort_valid_method("foobar")
   => 0

Note that the methods must NOT include the "sort_" or "cmp_" prefix.

sort_by_method, cmp_by_method
use Sort::DataTypes qw(:all)

sort_by_method($method,\@list [,@args]);
cmp_by_method ($method,$ele1,$ele2 [,@args]);

These sort a list, or compare two elements, using the given method (which is any string which returns 1 when passed to sort_valid_method. @args are arguments to pass to the sort.

If the method is not valid, the list is left untouched.

sort_numerical, sort_rev_numerical, cmp_numerical, cmp_rev_numerical
use Sort::DataTypes qw(:all)

sort_numerical(\@list);
sort_rev_numerical(\@list);

sort_numerical(\@list,\%hash);
sort_rev_numerical(\@list,\%hash);

$flag = cmp_numerical($x,$y);
$flag = cmp_rev_numerical($x,$y);

These sorts a list numerically in forward or reverse order, or compare two elements numerically. There is little reason to use either of these routines (it would be more efficient to simply call sort as:

sort { $a <=> $b } @list

but they are included for the sake of completeness (and for use by the sort_by_method/cmp_by_method routines). Also, if the code is being automatically generated, numerical sorts won't have to be a special case.

sort_alphabetic, sort_rev_alphabetic, cmp_alphabetic, cmp_rev_alphabetic
use Sort::DataTypes qw(:all)

sort_alphabetic(\@list);
sort_rev_alphabetic(\@list);

sort_alphabetic(\@list,\%hash);
sort_rev_alphabetic(\@list,\%hash);

$flag = cmp_alphabetic($x,$y);
$flag = cmp_rev_alphabetic($x,$y);

These do alphabetic sorts. As with numerical sorts, there is little reason to call these, and they are included for the sake of completeness.

sort_length, sort_rev_length, cmp_length, cmp_rev_length
use Sort::DataTypes qw(:all)

sort_length(\@list);
sort_rev_length(\@list);

sort_length(\@list,\%hash);
sort_rev_length(\@list,\%hash);

$flag = cmp_length($x,$y);
$flag = cmp_rev_length($x,$y);

These take strings and compare them by length and alphabetically if they are the same length.

sort_ip, sort_rev_ip, cmp_ip, cmp_rev_ip
use Sort::DataTypes qw(:all)

sort_ip(\@list);
sort_rev_ip(\@list);

sort_ip(\@list,\%hash);
sort_rev_ip(\@list,\%hash);

$flag = cmp_ip($x,$y);
$flag = cmp_rev_ip($x,$y);

These sort/compare IP numbers of the form A.B.C.D.

sort_domain, sort_rev_domain, cmp_domain, cmp_rev_domain
use Sort::DataTypes qw(:all)

sort_domain(\@list [,$sep]);
sort_rev_domain(\@list [,$sep]);

sort_domain(\@list, [$sep,] \%hash);
sort_rev_domain(\@list, [$sep,] \%hash);

$flag = cmp_domain($x,$y [,$sep]);
$flag = cmp_rev_domain($x,$y [,$sep]);

These sort domain names (A.B.C...) or anything else consisting of a class, subclass, subsubclass, etc., with the most significant class at the right.

Elements in the domain are separated from each other by a period (.) unless $sep is passed in. If $sep is passed in, it is a regular expression to split the elements in a domain.

Since the most significan element in the domain is at the right, any domain ending with ".com" would come before any domain ending in ".edu".

a.b < z.b < a.bb < z.bb < a.c
sort_numdomain, sort_rev_numdomain, cmp_numdomain, cmp_rev_numdomain
use Sort::DataTypes qw(:all)

sort_numdomain(\@list [,$sep]);
sort_rev_numdomain(\@list [,$sep]);

sort_numdomain(\@list, [$sep,] \%hash);
sort_rev_numdomain(\@list, [$sep,] \%hash);

$flag = cmp_numdomain($x,$y [,$sep]);
$flag = cmp_rev_numdomain($x,$y [,$sep]);

A related type of sorting is numdomain sorting. This is identical to domain sorting except that if two elements in the domain are integers, numerical sorts will be done. So:

a.2.c < a.11.c
sort_path, sort_rev_path, cmp_path, cmp_rev_path
use Sort::DataTypes qw(:all)

sort_path(\@list [,$sep]);
sort_rev_path(\@list [,$sep]);

sort_path(\@list, [$sep,] \%hash);
sort_rev_path(\@list, [$sep,] \%hash);

$flag = cmp_path($x,$y [,$sep]);
$flag = cmp_rev_path($x,$y [,$sep]);

This sorts paths (/A/B/C...) or anything else consisting of a class, subclass, subsubclass, etc., with the most significant class at the left.

Elements in a path (or classes, subclasses, etc.) are separated from each other by a slash (/) unless $sep is passed in. If $sep is passed in, it is a regular expression to split the elements in a path.

Since the most significant element in the domain is at the left, you get the following behavior:

a/b < a/z < aa/b < aa/z < b/b

When sorting lists that have a mixture of relative paths and explicit paths, the explicit paths will come first. So:

/b/c < a/b
sort_numpath, sort_rev_numpath, cmp_numpath, cmp_rev_numpath
use Sort::DataTypes qw(:all)

sort_numpath(\@list [,$sep]);
sort_rev_numpath(\@list [,$sep]);

sort_numpath(\@list, [$sep,] \%hash);
sort_rev_numpath(\@list, [$sep,] \%hash);

$flag = cmp_numpath($x,$y [,$sep]);
$flag = cmp_rev_numpath($x,$y [,$sep]);

A related type of sorting is numpath sorting. This is identical to path sorting except that if two elements in the path are integers, numerical sorts will be done. So:

a/2/c < a/11/c
sort_random, sort_rev_random, cmp_random, cmp_rev_random
use Sort::DataTypes qw(:all)

sort_random(\@list);
sort_rev_random(\@list);

sort_random(\@list,\%hash);
sort_rev_random(\@list,\%hash);

$flag = cmp_random($x,$y);
$flag = cmp_rev_random($x,$y);

This uses the Fisher-Yates algorithm to randomly shuffle an array in place. This routine was taken from the book

The Perl Cookbook
Tom Christiansen and Nathan Torkington

The sort_rev_random is identical, and is included simply for the situation where the sort routines are being called in some automatically generated code that may add the 'rev_' prefix.

The cmp_random simply returns a random -1, 0, or 1.

sort_version, sort_rev_version, cmp_version, cmp_rev_version
use Sort::DataTypes qw(:all)

sort_version(\@list);
sort_rev_version(\@list);

sort_version(\@list,\%hash);
sort_rev_version(\@list,\%hash);

$flag = cmp_version($x,$y);
$flag = cmp_rev_version($x,$y);

These sorts a list of version numbers of the form MAJOR.MINOR.SUBMINOR ... (any number of levels are allowed). The following examples should illustrate the ordering:

1.1.x < 1.2 < 1.2.x  Numerical versions are compared first at
                     the highest level, then at the next highest,
                     etc. The first non-equal compare sets the
                     order.
1.a < 1.b            Alphanumeric levels that start with a letter
                     are compared alphabetically.
1.2a < 1.2 < 1.03a   Alphanumeric levels that start with a number
                     are first compared numerically with only the
                     numeric part. If they are equal, alphanumeric
                     levels come before purely numerical levels.
                     Otherwise, they are compared alphabetically.
1.a < 1.2a           An alphanumeric level that starts with a letter
                     comes before one that starts with a number.
1.01a < 1.1a         Two alphanumeric levels that are numerically
                     equal in the number part and equal in the
                     remaining part are compared alphabetically.
sort_date, sort_rev_date, cmp_date, cmp_rev_date
use Sort::DataTypes qw(:all)

sort_date(\@list);
sort_rev_date(\@list);

sort_date(\@list,\%hash);
sort_rev_date(\@list,\%hash);

$flag = cmp_date($x,$y);
$flag = cmp_rev_date($x,$y);

These sorts a list of dates. Dates are anything that can be parsed with Date::Manip.

sort_line, sort_rev_line, cmp_line, cmp_rev_line
use Sort::DataTypes qw(:all)

sort_line(\@list,$n [,$sep]);
sort_rev_line(\@list,$n [,$sep]);

sort_line(\@list,$n, [$sep,] \%hash);
sort_rev_line(\@list,$n, [$sep,] \%hash);

$flag = cmp_line($x,$y,$n [,$sep]);
$flag = cmp_rev_line($x,$y,$n [,$sep]);

These take a list of lines and sort on the Nth field using $sep as the regular expression splitting the lines into fields. Fields are numbered starting at 0. If no $sep is given, it defaults to white space.

sort_numline, sort_rev_numline, cmp_numline, cmp_rev_numline
use Sort::DataTypes qw(:all)

sort_numline(\@list,$n [,$sep]);
sort_rev_numline(\@list,$n [,$sep]);

sort_numline(\@list,$n, [$sep,] \%hash);
sort_rev_numline(\@list,$n, [$sep,] \%hash);

$flag = cmp_numline($x,$y,$n [,$sep]);
$flag = cmp_rev_numline($x,$y,$n [,$sep]);

These are similar but will sort numerically if the Nth field is an integer, and alphabetically otherwise.

sort_function, sort_rev_function, cmp_function, cmp_rev_function
use Sort::DataTypes qw(:all)

sort_function(\@list,\&func);
sort_rev_function(\@list,\&func);

sort_function(\@list,\&func,\%hash);
sort_rev_function(\@list,\&func,\%hash);

$flag = cmp_function($x,$y,\&func);
$flag = cmp_rev_function($x,$y,\&func);

This is a catch-all sort function. It takes a reference to a function suitable to compare two elements and return -1, 0, or 1 depending on the order of the elements.

BACKWARDS INCOMPATIBILITIES

The following are a list of backwards incompatibilities.

Version 2.00 handling of hashes

In version 1.xx, when sorting by hash, the hash was passed in as the hash. As of 2.00, it is passed in by reference to avoid any confusion with optional arguments.

KNOWN PROBLEMS

None at this point.

LICENSE

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Sullivan Beck (sbeck@cpan.org)