NAME

IP::Unique - Store and count unique IPv4 addresses, optimized for large numbers of IPs

SYNOPSIS

use IP::Unique;
my $ipun = IP::Unique->new();

$ipun->add_ip("127.0.0.1");
$ipun->add_ip("127.0.0.1");
$ipun->add_ip("12.34.56.78");

$ipun->unique(); #In this example, 2
$ipun->total(); #In this example, 3

$ipun->compact();

DESCRIPTION

IP::Unique solves the problem of how to account for uniqueness, given a large number of IP addresses. Since this module is written in C to take advantage of fast integer handling, it performs (in my experience) several times as fast and with (on average), about 1/5th the amount of memory of similar perl solutions.

A traditional way to account for the uniqueness of a list of objects in perl is to use a hash as such:

for(@iplist)
{
	$ips->{$_} = 1;
}
$unique = int(keys %$ips);

The situation that quickly arises is that perl hashing algorithms perform poorly in regards to memory when they reach millions of objects. Databases are also cumbersome to work with, as 30 million rows are hard to keep distinct (and is a needless waste of time to look up and insert). This is where IP::Unique (hopefully) shines.

METHODS

IP::Unique is an OO module, so nothing is exported. The module contains the following methods

  • new()

    new() works just as it does in every other module. It takes no parameters, and returns a reference to a new instance of the object.
  • add_ip("127.0.0.1")

    add_ip() takes a string parameter, formatted in AAA.BBB.CCC.DDD format. It returns 1 on the success of adding the IP address to the table, 0 if the address is poorly formatted or invalid.  For the purposes of validity, addresses such as "0.0.0.0" and "255.255.255.255" are considered valid, but "256.0.0.0" is not. Adding IPs increments the counter as returned by c<total>().
  • compact()

    compact() is an internal function that is exposed for the rare situation where you think you can save yourself up memory by removing all of the duplicates stored internally.  It is unlikely that you will save yourself a lot of space, unless you have heavy duplicate saturation.  This function is called internally before C<unique>(), so there's no need to do so, except to attempt to save memory mid-run. For slower machines, this may take some time to complete. When parsing a well-used websites logs (for which this module was written), you will probably not need this function. YMMV.
  • unique()

    unique() returns the number of uniques stored in the counter.  This has to run C<compact>() to remove duplicates, so the same caveat applies here: this may take some time. 
  • total()

    total() returns the number of IPs (total) stored in the counter, so far. There is no way to remove an IP.

CHANGES

Version 0.01 - Feb 19th, 2003

  • Initial release of the module, so everything is new

BUGS

There are several items that can be considered bugs in the module

  • It is hardcoded to use g++. If you know a better way to tell Makefile.PL to use "any c++ compiler", please let me know.

  • I've had difficulty getting this module to work under cygwin

  • This module should work under versions of perl other than 5.8.3, but it hasn't been tested with such. If you can get it to run under a lower version of perl, please contact me.

AUTHOR

Jay Bonci, <jaybonci@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2004 by Jay Bonci, Open Source Development Network

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of Perl 5 you may have available.