NAME
Memoize - Make your functions faster by trading space for time
SYNOPSIS
use Memoize;
memoize('slow_function');
slow_function(arguments); # Is faster than it was before
DESCRIPTION
`Memoizing' a function makes it faster by trading space for time. Here is an example. Consider the Fibonacci sequence, defined by the following function:
# Compute Fibonacci numbers
sub fib {
my $n = shift;
return $n if $n < 2;
fib($n-1) + fib($n-2);
}
This function is very slow. Why? To compute fib(14), it first wants to compute fib(13) and fib(12), and add the results. But to compute fib(13), it first has to compute fib(12) and fib(11), and then it comes back and computes fib(12) all over again even though the answer is the same. And both of the times that it wants to compute fib(12), it has to compute fib(11) from scratch, and then it has to do it again each time it wants to compute fib(13). This function does so much recomputing of old results that it takes a really long time to run---fib(14) makes 1,200 extra recursive calls to itself, to compute and recompute things that it already computed.
This function is a good candidate for memoization. Whenever a memoized function computes a result, it saves the result in a table. Then, if you ask the function to do the same work later, it just gives you the answer that was in the table, instead of computing it all over again.
This module will automatically memoize functions for you. For example, if you memoize the `fib' function above, it will compute fib(14) exactly once, the first time it needs to, and then save the result in a table. Then if you ask for fib(14) again, it gives you the result out of the table. While computing fib(14), instead of computing fib(12) twice, it does it once; the second time it needs the value it gets it from the table. It doesn't compute fib(11) four times; it computes it once, getting it from the table the next three times. Instead of making 1,200 recursive calls to `fib', it makes 15. This makes the function about 150 times faster.
You could do the memoization yourself, by rewriting the function, like this:
# Compute Fibonacci numbers, memoized version
{ my @fib;
sub fib {
my $n = shift;
return $fib[$n] if defined $fib[$n];
return $fib[$n] = $n if $n < 2;
$fib[$n] = fib($n-1) + fib($n-2);
}
}
Or you could use this module, like this:
use Memoize;
memoize('fib');
# Rest of the fib function just like the original version.
This makes it easy to turn memoizing on and off.
DETAILS
This module exports exactly one function, memoize
. The rest of the functions in this package are None of Your Business.
You should say
memoize(function)
where function
is the name of the function you want to memoize, or a reference to it. memoize
returns a reference to the new, memoized version of the function.
If function
was the name of a function, then memoize
hides the old version and installs the new memoized version under the old name, so that &function(...)
actually invokes the memoized version.
OPTIONS
There are some optional options you can pass to memoize
to change the way it behaves a little. To supply options, invokdle memoize
like this:
memoize(function, { TODISK => filename,
NORMALIZER => function,
INSTALL => newname
});
Each of these three options is optional; you can include some, all, or none of them.
INSTALL
If you supply a function name with INSTALL
, memoize will install the new, memoized version of the function under the name you give. For example,
memoize('fib', INSTALL => 'fastfib')
installs the memoized version of fib
as fastfib
; without the INSTALL
option it would have replaced the old fib
with the memoized version.
NORMALIZER
Suppose your function looks like this:
# Typical call: f('aha!', A => 11, B => 12);
sub f {
my $a = shift;
my %hash = @_;
$hash{B} ||= 2; # B defaults to 2
$hash{C} ||= 7; # C defaults to 7
# Do something with $a, %hash
}
Now, the following calls to your function are all completely equivalent:
f(OUCH);
f(OUCH, B => 2);
f(OUCH, C => 7);
f(OUCH, B => 2, C => 7);
f(OUCH, C => 7, B => 2);
(etc.)
However, unless you tell Memoize
that these calls are equivalent, it will not know that, and it will compute the values for these invocations of your function separately, and store them separately.
To prevent this, supply a NORMALIZER
function that turns the program arguments into a string in a way that equivalent arguments turn into the same string. A NORMALIZER
function for f
above might look like this:
sub normalize_f {
my $a = shift;
my %hash = @_;
$hash{B} ||= 2;
$hash{C} ||= 7;
join($;, $a, map ($_ => $hash{$_}) sort keys %hash);
}
Each of the argument lists above comes out of the normalize_f
function looking exactly the same, like this:
OUCH^\B^\2^\C^\7
memoize
knows that if the normalized version of the arguments is the same for two argument lists, then it can safely look up the value that it computed for one argument list and return it as the result of calling the function with the other argmuent list, even if the argument lists look different.
TODISK
TODISK
means that the memo table should be saved to disk so that it will persist between invokations of your program. If you use this option, future runs of your program will get immediate benefit from the results computed by earlier runs. A useful use of this feature: You can construct a batch program that runs in the background and populates the memo table, and then when you come to run your real program the memoized function will be screamingly fast because al lits results have been precomputed. Or you would be able to do this, if TODISK were implemented, which it presently isn't. But it will be. Some day.
CAVEATS
Memoization is not a cure-all:
- Do not memoize a function whose behavior depends on program state other than its own arguments, such as global variables, the time of day, or file input. These functions whill not produce correct results when memoized. For a particularly easy example:
-
sub f { my $i = <STDIN>; chomp $i; $i; }
This function takes no arguments, and as far as
Memoize
is concerned, it always returns the same result.Memoize
is wrong, of course, and the memoized version of this function will read STDIN once to get a string from the user, and it will return that same string every time you call it after that. - Do not memoize a function with side effects.
-
sub f { my ($a, $b) = @_; my $s = $a + $b; print "$a + $b = $s.\n"; }
This function accepts two arguments, adds them, and prints their sum. Its return value is the numuber of characters it printed, but you probably didn't care abuot that. But
Memoize
doesn't understand that. If you memoize this function, you will get the result you expect the first time you ask it to prnit the sum of 2 and 3, but subsequent calls will return the number 11 (the return value ofprint
) without actually printing anything. - Do not memoize a function that returns a data structure that is modified by its caller.
-
Consider these functions:
getusers
returns a list of users somehow, and thenmain
throws away the first user on the list and prints the rest:sub main { my $userlist = getusers(); shift @$userlist; foreach $u (@$userlist) { print "User $u\n"; } } sub getusers { my @users; # Do something to get a list of users; \@users; # Return reference to list. }
If you memoize
getusers
here, it will work right exactly once. The reference to the users list will be stored in the memo table.main
will discard the first element from the referenced list. The next time you invokemain
,Memoize
will not callgetusers
; it will just return the same reference to the same list it got last time. But this time the list has already had its head removed;main
will erroneously remove another element from it. The list will get shorter and shorter every time you callmain
.
TO DO
- There should be an
unmemoize
function. - We should extend the benchmarking module to allow
-
timethis(main, MEMOIZED => [ suba, subb ])
What would this do? It would time
main
three times, once withsuba
andsubb
unmemoized, twice with them memoized.Why would you want to do this? By the third set of runs, the memo tables would be fully populated, so all calls by
main
tosuba
andsubb
wuold return immediately. You would be able to see how much ofmain
's running time was due to time spent computing insuba
andsubb
. If that was just a little time, you would know that optimizing or improvingsuba
andsubb
would not have a large effect on the performance ofmain
. But if there was a big difference, you would know thatsuba
orsubb
was a good candidate for optimization if you needed to makemain
go faster. - There was some other stuff, but I forget.
- Maybe a tied-hash interface to the memo-table, which a hook to automatically populate an entry if no value is there yet?
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 201:
'=item' outside of any '=over'
- Around line 264:
You forgot a '=back' before '=head1'