NAME
Hash::Type - restricted, ordered hashes as arrays tied to a "type" (shared list of keys)
SYNOPSIS
use Hash::Type;
# create a Hash::Type
my $person_type = Hash::Type->new(qw/firstname lastname city/);
# create and populate some hashes tied to $person_type
tie my(%wolfgang), $person_type, "wolfgang amadeus", "mozart", "salzburg";
my $ludwig = $person_type->new("ludwig", "van beethoven", "vienna");
my $jsb = $person_type->new;
$jsb->{city} = "leipzig";
@{$jsb}{qw/firstname lastname/} = ("johann sebastian", "bach");
# add fields dynamically
$person_type->add("birth", "death") or die "fields not added";
$wolfgang{birth} = 1750;
# get back ordered names or values
my @fields = $person_type->names; # same as: keys %wolfgang
my @vals = $person_type->values(\%wolfgang); # same as: values %wolfgang
# More complete example : read a flat file with headers on first line
my ($headerline, @datalines) = map {chomp; $_} <F>;
my $ht = Hash::Type->new(split /\t/, $headerline);
foreach my $line (@datalines) {
my $data = $ht->new(split /\t/, $line);
work_with($data->{some_field}, $data->{some_other_field});
}
# get many lines from a DBI database
my $sth = $dbh->prepare($sql);
$sth->execute;
my $ht = Hash::Type->new(@{$sth->{NAME}});
while (my $r = $sth->fetchrow_arrayref) {
my $row = $ht->new(@$r);
work_with($row);
}
# an alternative to Time::gmtime and Time::localtime
my $time_type = Hash::Type->new(qw/sec min hour mday mon year wday yday/);
my $localtime = $time_type->new(localtime);
my $gmtime = $time_type->new(gmtime);
print $localtime->{hour} - $gmtime->{hour}, " hours difference to GMT";
# an alternative to File::Stat or File::stat or File::Stat::OO
my $stat_type = Hash::Type->new(qw/dev ino mode nlink uid gid rdev
size atime mtime ctime blksize blocks/);
my $stat = $stat_type->new(stat $my_file);
print "$my_file has $stat->{size} bytes and $stat->{blocks} blocks";
# comparison functions
my $by_age = $person_type->cmp("birth : -num, lastname, firstname");
my $by_name_length = $person_type->cmp(
lastname => {length($b) <=> length($a)},
lastname => 'alpha',
firstname => 'alpha',
);
show_person($_) foreach (sort $by_age @people);
show_person($_) foreach (sort $by_name_length @people);
# special comparisons : dates
my $US_date_cmp = $my_hash_type->cmp("some_date_field : m/d/y");
my $FR_inverse_date_cmp = $my_hash_type->cmp("some_date_field : -d.m.y");
DESCRIPTION
An instance of Hash::Type
encapsulates a collection of field names, and is used to generate tied hashes, implemented internally as arrayrefs, and sharing the common list of fields.
The original motivation for this design was to spare memory, since the field names are shared. As it turns out, benchmarks show that this goal is not attained : memory usage is about 35% higher than Perl native hashes. However, this module also implements restricted hashes (hashes with a fixed set of keys that cannot be expanded) and of ordered hashes (the list of keys or list values are returned in a fixed order); and for those two functionalities, the performances of Hash::Type
are very competitive with respect to those of other similar modules, both in terms of CPU and memory usage (see the "BENCHMARKS" section at the end of the documentation). In addition, Hash::Type
offers an API for convenient and very efficient sorting of lists of tied hashes, and alternative methods for keys
, values
and each
, faster than the "perltie" API.
In conclusion, this module is well suited for any need of restricted and/or ordered hashes, and for situations dealing with large collections of homogeneous hashes, like for example data rows coming from spreadsheets or databases.
METHODS
new
The new()
method is polymorphic : it can be used both as a class and as an instance method.
new()
as a class method
my $h_type = Hash::Type->new(@names);
Creates a new instance which holds a collection of names and associated indices (technically, this is a hash reference blessed in package Hash::Type
). This instance can then be used to generate tied hashes. The list of @names
is optional ; names can be added later through the add
method.
new()
as an instance method
my $tied_hashref = $h_type->new(@vals);
Creates a new tied hash associated to the Hash::Type
class and containing a reference to the $h_type
instance. Internally, the tied hash is implemented as an array reference.
TIE interface
Tied hash creation
The tie
syntax is an alternative to the new()
method seen above :
tie my(%hash), $h_type, @vals;
This is a bit unusual since the official syntax for the tie function is tie VARIABLE, CLASSNAME, LIST
; here the second argument is not a classname, but an object. It works well, however, because the TIEHASH
call merely passes the second argument to the implementation, without any check that this is a real scalar classname.
Accessing names in tied hashes
Access to $hash{name}
works like for a regular Perl hash. It is equivalent to writing
tied(%hash)->[$h_type->{name}]
where $h_type
is the Hash::Type
instance. That instance can be retrieved through the special, reserved name $hash{'Hash::Type'}
; you may need it for example to generate a comparison function, as explained below. The reserved name $hash{'Hash::Type'}
can be read but it does not belong to keys %hash
.
The operation delete $hash{name}
is forbidden since it would break the consistency of the internal arrayref. Any attempt to delete a name will generate an exception. Of course it is always possible to set a field to undef
.
Iterating on keys
Standard Perl operations keys
, values
and each
on tied hashes preserve the order in which names were added to the Hash::Type
instance.
The same behavior can be obtained through object-oriented method calls, described below.
add
$h_type->add(@new_names);
Adds @new_names
in $h_type
and gives them new indices. Does nothing for names that were already present. Returns the number of names actually added.
Existing hashes already tied to $h_type
are not touched; their internal arrayrefs are not expanded, but the new fields will spring into existence as soon as they are assigned a value, thanks to Perl's auto-vivification mechanism.
names
my @headers = $h_type->names;
Returns the list of defined names, in index order. This is the same list as keys %hash
, but computed faster because it directly returns a copy of the underlying name array instead of having to iterate through the keys.
values
my @vals = $h_type->values(\%hash);
Returns the list of values in %hash
, in index order. This is the same list as values %hash
, but computed faster.
each
my $iterator = $h_type->each(\%hash);
while (my ($key, $value) = $iterator->()) {
# ...
}
Returns an iterator function that yields pairs of ($key, $value) at each call, until reaching an empty list at the end of the tied hash. This is the same as calling each %hash
, but works faster.
cmp
first syntax : one single argument
my $cmp = $h_type->cmp("f1 : cmp1, f2 : cmp2 , ...")
Returns a reference to an anonymous sub which successively compares the given field names, applying the given operators, and returns a positive, negative or zero value. This sub can then be fed to sort
. 'f1', 'f2', etc are field names, 'cmp1', 'cmp2' are comparison operators written as :
[+|-] [alpha|num|cmp|<=>|d.m.y|d/m/y|y-m-d|...]
The sign is '+' for ascending order, '-' for descending; default is '+'. Operator 'alpha' is synonym to 'cmp' and 'num' is synonym to '<=>'; operators 'd.m.y', 'd/m/y', etc. are for dates in various formats; default is 'alpha'. So for example
my $cmp = $h_type->cmp("foo : -alpha, bar : num");
will sort alphabetically on foo
in descending order; in case of identical values for the foo
field, it will sort numerically on the bar
field.
If all you want is alphabetic ascending order, just write the field names :
my $cmp = $person_type->cmp('lastname, firstname');
Note : sort
will not accept something like
sort $person_type->cmp('lastname, firstname') @people;
so you have to store it in a variable first :
my $cmp = $person_type->cmp('lastname', 'firstname');
sort $cmp @people;
For date comparisons, values are parsed into day/month/year, according to the shape specified (for example 'd.m.y') will take '.' as a separator. Day, month or year need not be several digits, so '1.1.1' will be interpreted as '01.01.2001'. Years of 2 or 1 digits are mapped to 2000 or 1900, with pivot at 33 (so 32 becomes 2032 and 33 becomes 1933).
second syntax : pairs of (field_name => comparison_specification)
my $cmp = $h_type->cmp(f1 => cmp1, f2 => cmp2, ...);
This second syntax, with pairs of field names and operators, is a bit more verbose but gives you more flexibility, as you can write your own comparison functions using $a
and $b
:
my $by_name_length = $person_type->cmp(
lastname => {length($b) <=> length($a)},
lastname => 'alpha',
firstname => 'alpha',
);
Note : the resulting closure is bound to the special variables $a
and $b
. Since those are different in each package, you cannot pass the comparison function to another package : the call to sort
has to be done in the package where the comparison function was compiled.
INTERNALS
A Hash::Type
instance is a blessed hashref, in which each declared name is associated with a corresponding index (starting at 1).
In addition, the hashref contains a private key \0HTkeys\0
holding an arrayref to the ordered list of names, in order to provide a fast implementation for the NEXTKEY
operation. In the unlikely case that this private key would cause a conflict, it can be changed by setting $Hash::Type::reserved_keys_field
to a different value.
Tied hashes are implemented as arrayrefs, in which slot 0 points to the Hash::Type
instance, and slots 1 to n are the values for the named fields.
The particular thing about this module is that it has two different families of instances blessed into the same class :
blessed hashrefs are "types", holding a collection of field names, answering to object-oriented methods of the public interface (i.e. methods "add", "names", etc.)
blessed arrayrefs are implementations of tied hashes, answering to implicit method calls of the "perltie" API (i.e methods
FIRSTKEY
,NEXTKEY
,EXISTS
, etc.).
SEE ALSO
The 'pseudo-hashes' documented in perlref were very similar, but were deprecated since Perl 5.8.0.
For other ways to restrict the keys of a hash to a fixed set, see "lock_keys" in Hash::Util, Tie::Hash::FixedKeys, Tie::StrictHash.
For other ways to implement ordered hashes, see Tie::IxHash, Hash::Ordered, Tie::Hash::Indexed, and many others.
The Sort::Fields module in CPAN uses techniques similar to the present "cmp" method for dynamically building sorting criterias according to field positions; but it is intended for numbered fields, not for named fields, and has no support for caller-supplied comparison operators. The design is also a bit different : fieldsort
does everything at once (splitting, comparing and sorting), whereas Hash::Type::cmp
only compares, and leaves it to the caller to do the rest.
Hash::Type
was primarily designed as a core element for implementing rows of data in File::Tabular.
BENCHMARKS
A benchmark program is supplied within the distribution to compare performances of Hash::Type
to some other solutions. Compared operations were hash creation, data access, data update, sorting, and deletion, varying the number of tied hashes, the number of keys, and also the length of hash keys (surprisingly, this can make a difference!).
To give an idea, here are just a few results :
200000 records with 10 keys of 1 chars
create update access sort delete memory
======= ======= ======= ======= ======= ========
4.181 2.246 0.812 0.062 0.016 132.5MB (perl core hashes)
2.683 5.366 1.357 0.110 0.078 227.4MB (Hash::Type v2.00)
3.386 5.740 1.358 5.194 0.110 524.0MB (Hash::Ordered v0.010)
5.756 7.005 1.451 5.818 0.109 539.8MB (Tie::IxHash v1.23)
3.167 4.196 1.279 5.258 0.171 508.2MB (Tie::Hash::Indexed v0.05)
200000 records with 5 keys of 20 chars
create update access sort delete memory
======= ======= ======= ======= ======= ========
1.482 1.201 0.811 0.078 0.031 132.5MB (perl core hashes)
1.669 2.933 1.326 0.125 0.063 180.0MB (Hash::Type v2.00)
2.200 3.120 1.341 5.148 0.109 524.0MB (Hash::Ordered v0.010)
3.541 3.744 1.498 5.257 0.141 539.8MB (Tie::IxHash v1.23)
1.826 2.215 1.279 5.164 0.172 506.2MB (Tie::Hash::Indexed v0.05)
100000 records with 50 keys of 2 chars
create update access sort delete memory
======= ======= ======= ======= ======= ========
6.349 5.382 0.406 0.062 0.171 384.5MB (perl core hashes)
5.429 13.384 0.702 0.063 0.110 432.0MB (Hash::Type v2.00)
9.048 14.211 0.687 1.232 0.577 1180.7MB (Hash::Ordered v0.010)
14.508 17.113 0.765 1.513 0.702 1354.6MB (Tie::IxHash v1.23)
7.893 11.997 0.670 1.358 1.107 1542.4MB (Tie::Hash::Indexed v0.05)
The conclusions (confirmed by other results not displayed here) are that Hash::Type
is quite reasonable in CPU and memory usage, with respect to other modules implementing ordered hashes. It even sometimes outperforms perl core hashes on creation time. It is especially good at sorting, but this due to its API allowing to compile a sorting function before applying it to a list of hashes.
Among the competitors Tie::Hash::Indexed is the fastest, which is not suprising as it is implemented in XS, while all other are implemented in pure Perl; notice however that this module is quite costly in terms of memory.
Tie::IxHash, which was among the first modules on CPAN for ordered hashes, is definitely the slowest and is quite greedy for memory.
Hash::Ordered, a more recent proposal, sits in the middle; however, it should be noted that the benchmark was based on its "perltie" API, which is probably not the most efficient way to use this module.
AUTHOR
Laurent Dami, <laurent.dami AT cpan dot org>
COPYRIGHT AND LICENSE
Copyright 2005, 2016 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.