NAME
IP::Unique - Store and count unique IPv4 addresses, optimized for large numbers of IPs
SYNOPSIS
use IP::Unique;
my $ipun = IP::Unique->new();
$ipun->add_ip("127.0.0.1");
$ipun->add_ip("127.0.0.1");
$ipun->add_ip("12.34.56.78");
$ipun->unique(); #In this example, 2
$ipun->total(); #In this example, 3
$ipun->compact();
DESCRIPTION
IP::Unique solves the problem of how to account for uniqueness, given a large number of IP addresses. Since this module is written in C to take advantage of fast integer handling, it performs (in my experience) several times as fast and with (on average), about 1/5th the amount of memory of similar perl solutions.
A traditional way to account for the uniqueness of a list of objects in perl is to use a hash as such:
for(@iplist)
{
$ips->{$_} = 1;
}
$unique = int(keys %$ips);
The situation that quickly arises is that perl hashing algorithms perform poorly in regards to memory when they reach millions of objects. Databases are also cumbersome to work with, as 30 million rows are hard to keep distinct (and is a needless waste of time to look up and insert). This is where IP::Unique (hopefully) shines.
METHODS
IP::Unique is an OO module, so nothing is exported. The module contains the following methods
new()
new() works just as it does in every other module. It takes no parameters, and returns a reference to a new instance of the object.
add_ip("127.0.0.1")
add_ip() takes a string parameter, formatted in AAA.BBB.CCC.DDD format. It returns 1 on the success of adding the IP address to the table, 0 if the address is poorly formatted or invalid. For the purposes of validity, addresses such as "0.0.0.0" and "255.255.255.255" are considered valid, but "256.0.0.0" is not. Adding IPs increments the counter as returned by c<total>().
compact()
compact() is an internal function that is exposed for the rare situation where you think you can save yourself up memory by removing all of the duplicates stored internally. It is unlikely that you will save yourself a lot of space, unless you have heavy duplicate saturation. This function is called internally before C<unique>(), so there's no need to do so, except to attempt to save memory mid-run. For slower machines, this may take some time to complete. When parsing a well-used websites logs (for which this module was written), you will probably not need this function. YMMV.
unique()
unique() returns the number of uniques stored in the counter. This has to run C<compact>() to remove duplicates, so the same caveat applies here: this may take some time.
total()
total() returns the number of IPs (total) stored in the counter, so far. There is no way to remove an IP.
CHANGES
Version 0.01 - Feb 19th, 2003
Initial release of the module, so everything is new
BUGS
There are several items that can be considered bugs in the module
It is hardcoded to use g++. If you know a better way to tell Makefile.PL to use "any c++ compiler", please let me know.
I've had difficulty getting this module to work under cygwin
This module should work under versions of perl other than 5.8.3, but it hasn't been tested with such. If you can get it to run under a lower version of perl, please contact me.
AUTHOR
Jay Bonci, <jaybonci@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Jay Bonci, Open Source Development Network
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of Perl 5 you may have available.