NAME
Memoize - Make your functions faster by trading space for time
SYNOPSIS
use Memoize;
memoize('slow_function');
slow_function(arguments); # Is faster than it was before
DESCRIPTION
`Memoizing' a function makes it faster by trading space for time. Here is an example. Consider the Fibonacci sequence, defined by the following function:
# Compute Fibonacci numbers
sub fib {
my $n = shift;
return $n if $n < 2;
fib($n-1) + fib($n-2);
}
This function is very slow. Why? To compute fib(14), it first wants to compute fib(13) and fib(12), and add the results. But to compute fib(13), it first has to compute fib(12) and fib(11), and then it comes back and computes fib(12) all over again even though the answer is the same. And both of the times that it wants to compute fib(12), it has to compute fib(11) from scratch, and then it has to do it again each time it wants to compute fib(13). This function does so much recomputing of old results that it takes a really long time to run---fib(14) makes 1,200 extra recursive calls to itself, to compute and recompute things that it already computed.
This function is a good candidate for memoization. Whenever a memoized function computes a result, it saves the result in a table. Then, if you ask the function to do the same work later, it just gives you the answer that was in the table, instead of computing it all over again.
This module will automatically memoize functions for you. For example, if you memoize the `fib' function above, it will compute fib(14) exactly once, the first time it needs to, and then save the result in a table. Then if you ask for fib(14) again, it gives you the result out of the table. While computing fib(14), instead of computing fib(12) twice, it does it once; the second time it needs the value it gets it from the table. It doesn't compute fib(11) four times; it computes it once, getting it from the table the next three times. Instead of making 1,200 recursive calls to `fib', it makes 15. This makes the function about 150 times faster.
You could do the memoization yourself, by rewriting the function, like this:
# Compute Fibonacci numbers, memoized version
{ my @fib;
sub fib {
my $n = shift;
return $fib[$n] if defined $fib[$n];
return $fib[$n] = $n if $n < 2;
$fib[$n] = fib($n-1) + fib($n-2);
}
}
Or you could use this module, like this:
use Memoize;
memoize('fib');
# Rest of the fib function just like the original version.
This makes it easy to turn memoizing on and off.
Here's an even simpler example: I wrote a simple ray tracer; the program would look in a certain direction, figure out what it was looking at, and then convert the `color' value (typically a string like `red') of that object to a red, green, and blue pixel value, like this:
for ($direction = 0; $direction < 300; $direction++) {
# Figure out which object is in direction $direction
$color = $object->{color};
($r, $g, $b) = @{&ColorToRGB($color)};
...
}
Since there are relatively few objects in a picture, there are only a few colors, which get looked up over and over again. Memoizing ColorToRGB
speeded up the program by several percent.
DETAILS
This module exports exactly one function, memoize
. The rest of the functions in this package are None of Your Business.
You should say
memoize(function)
where function
is the name of the function you want to memoize, or a reference to it. memoize
returns a reference to the new, memoized version of the function, or undef
on a non-fatal error. At present, there are no non-fatal errors, but there might be some in the future.
If function
was the name of a function, then memoize
hides the old version and installs the new memoized version under the old name, so that &function(...)
actually invokes the memoized version.
OPTIONS
There are some optional options you can pass to memoize
to change the way it behaves a little. To supply options, invoke memoize
like this:
memoize(function, TODISK => filename,
NORMALIZER => function,
INSTALL => newname
);
Each of these three options is optional; you can include some, all, or none of them.
INSTALL
If you supply a function name with INSTALL
, memoize will install the new, memoized version of the function under the name you give. For example,
memoize('fib', INSTALL => 'fastfib')
installs the memoized version of fib
as fastfib
; without the INSTALL
option it would have replaced the old fib
with the memoized version.
NORMALIZER
Suppose your function looks like this:
# Typical call: f('aha!', A => 11, B => 12);
sub f {
my $a = shift;
my %hash = @_;
$hash{B} ||= 2; # B defaults to 2
$hash{C} ||= 7; # C defaults to 7
# Do something with $a, %hash
}
Now, the following calls to your function are all completely equivalent:
f(OUCH);
f(OUCH, B => 2);
f(OUCH, C => 7);
f(OUCH, B => 2, C => 7);
f(OUCH, C => 7, B => 2);
(etc.)
However, unless you tell Memoize
that these calls are equivalent, it will not know that, and it will compute the values for these invocations of your function separately, and store them separately.
To prevent this, supply a NORMALIZER
function that turns the program arguments into a string in a way that equivalent arguments turn into the same string. A NORMALIZER
function for f
above might look like this:
sub normalize_f {
my $a = shift;
my %hash = @_;
$hash{B} ||= 2;
$hash{C} ||= 7;
join($;, $a, map ($_ => $hash{$_}) sort keys %hash);
}
Each of the argument lists above comes out of the normalize_f
function looking exactly the same, like this:
OUCH^\B^\2^\C^\7
You would tell Memoize
to use this normalizer this way:
memoize('f', NORMALIZER => 'normalize_f');
memoize
knows that if the normalized version of the arguments is the same for two argument lists, then it can safely look up the value that it computed for one argument list and return it as the result of calling the function with the other argmuent list, even if the argument lists look different.
The default normalizer just concatenates the arguments with $;
in between. This always works correctly for functions with only one argument, and also when the arguments never contain $;
(which is normally character #28, control-\. ) However, it can confuse certain argument lists:
normalizer("a\034", "b")
normalizer("a", "\034b")
normalizer("a\034\034b")
for example.
The calling context of the function (scalar or list context) is propagated to the normalizer. This means that if the memoized function will treat its arguments differently in list context than it would in scalar context, you can have the normalizer function select its behavior based on the results of wantarray
. Even if called in a list context, a normalizer should still return a single string.
TODISK
TODISK
means that the memo table should be saved to disk so that it will persist between invocations of your program. If you use this option, future runs of your program will get immediate benefit from the results computed by earlier runs. A useful use of this feature: You can construct a batch program that runs in the background and populates the memo table, and then when you come to run your real program the memoized function will be screamingly fast because al lits results have been precomputed. Or you would be able to do this, if TODISK were implemented, which it presently isn't. But it will be. Some day.
OTHER FUNCTION
There's an unmemoize
function that you can import if you want to. If you use it, please let me know what it was good for, since I can only think of very limited uses for it and was considering leaving it out altogether.
It accepts a reference to, or the name of a previously memoized function, and undoes whatever it did to provide the memoized version in the first place, including making the name refer to the unmemoized version if appropriate. It returns a reference to the unmemoized version of the function.
If you ask it to unmemoize a function that was never memoized, it croaks.
CAVEATS
Memoization is not a cure-all:
Do not memoize a function whose behavior depends on program state other than its own arguments, such as global variables, the time of day, or file input. These functions will not produce correct results when memoized. For a particularly easy example:
sub f { time; }
This function takes no arguments, and as far as
Memoize
is concerned, it always returns the same result.Memoize
is wrong, of course, and the memoized version of this function will calltime
once to get the current time, and it will return that same time every time you call it after that.Do not memoize a function with side effects.
sub f { my ($a, $b) = @_; my $s = $a + $b; print "$a + $b = $s.\n"; }
This function accepts two arguments, adds them, and prints their sum. Its return value is the numuber of characters it printed, but you probably didn't care about that. But
Memoize
doesn't understand that. If you memoize this function, you will get the result you expect the first time you ask it to print the sum of 2 and 3, but subsequent calls will return the number 11 (the return value ofprint
) without actually printing anything.Do not memoize a function that returns a data structure that is modified by its caller.
Consider these functions:
getusers
returns a list of users somehow, and thenmain
throws away the first user on the list and prints the rest:sub main { my $userlist = getusers(); shift @$userlist; foreach $u (@$userlist) { print "User $u\n"; } } sub getusers { my @users; # Do something to get a list of users; \@users; # Return reference to list. }
If you memoize
getusers
here, it will work right exactly once. The reference to the users list will be stored in the memo table.main
will discard the first element from the referenced list. The next time you invokemain
,Memoize
will not callgetusers
; it will just return the same reference to the same list it got last time. But this time the list has already had its head removed;main
will erroneously remove another element from it. The list will get shorter and shorter every time you callmain
.