NAME
Data::Favorites - tally a data stream to find recently dominant items
SYNOPSIS
use Data::Favorites;
my $faves = new Data::Favorites();
$faves->tally($_)
foreach (@history);
$faves->decay( 2 ); # everyone loses two points
$faves->clamp( time() - 24*60*60 ); # cull everyone older than a day
print join("\n", $faves->favorites( 5 )), "\n";
ABSTRACT
A Favorites structure tracks the disposition of various keys. A key's disposition is a measurement of its relative predominance and freshness when tallied. This is a good way to infer favorites or other leadership-oriented facts from a historical data stream.
More specifically, this structure measures how often and when various keys are triggered by application-defined events. Those keys that are mentioned often will accumulate a higher number of tally points. Those keys that have been mentioned recently will have newer "freshness" stamps. Both of these factors are metered and will affect their positioning in a ranking of the keys.
At any time, keys can be culled by freshness or by their current ranking, or both. With these approaches, dispositions can be weighed over the whole historical record, rather than providing a simplistic "top events in the last N events" rolling count. Thus, highly popular event keys may remain in the set of favorites for some time, even when the key hasn't been seen very often recently. Popular items can be decayed gradually rather than cut out of a simple census window.
METHODS
new()
$faves = new Data::Favorites( );
$faves = new Data::Favorites( \&stamper );
Create a new favorites counter object. The counter object can tally given elements, and also stamp the "freshness" of each element with the numerical return from the given stamper sub. If no sub code reference is given, then the time()
built-in function is assumed by default. It is assumed that the sub returns a number which generally increases in value for fresher stamps.
tally()
$times = $faves->tally( $scalar );
$times = $faves->tally( $scalar, $times );
Return the current number of times the given $scalar
has been seen, or increment that count by a given number of times. The first form returns undef
if the $scalar
has never been tallied.
Items are tracked by their string form, so if the scalars are perl references, take note that the whole favorites counter will not persist well. A future version may use Tie::RefHash
to allow for persistable tracking of object data.
Each key in the favorites counter is marked with a timestamp via the time()
function, or the stamper sub reference given during creation of the favorites counter object. In the case of an application-supplied stamper function, it will receive two arguments: this favorites counter itself, and the given scalar being tallied.
fresh()
$stamp = $faves->fresh( $scalar );
Return the current freshness stamp for the given $scalar
. Returns undef
if the $scalar
has never been tallied.
Each key in the favorites counter is marked with a timestamp via the time()
function, or the stamper sub reference given during creation of the favorites counter object. In the case of an application-supplied stamper callback, it will receive two arguments: this favorites counter itself, and the given scalar being tallied.
decay()
$count = $faves->decay( );
$count = $faves->decay( $times );
$times = $faves->decay( $scalar );
$times = $faves->decay( $scalar, $times );
In the first pair of forms, all present keys have their tally counts reduced by one, or by the given number of times. In these forms, the returned value is the remaining count of tracked favorite keys.
In the latter pair of forms, an individual key $scalar
has its tally reduced by one, or by the given number of times. These forms return the remaining tally count for the given $scalar
key.
The favorites counter will automatically remove any key in which the tally count drops to zero or below.
clamp()
$count = $faves->clamp( $stamp );
Clamps the set of favorites to only the freshest tallied elements. This method automatically removes any key in which the most recent tally is more stale than the given timestamp value. Timestamps are assumed to be numerical; lesser values represent stamps which are more stale, while higher values are considered more fresh.
favorites()
@topfaves = $faves->favorites( );
@topfaves = $faves->favorites( $limit );
$count = scalar $faves->favorites( );
Returns the keys sorted by the strength of their tally counts. Those which have equal tally counts are compared by their most recent tally time; the most freshly stamped is favored. If a limit is given, the list returned will not exceed the given length.
In a scalar context, returns the current count of the tallied keys in the favorites counter. If no limit argument is given, then no internal sorting work needs to be performed to return the count.
DESCRIPTION
After creating a Data::Favorites object, the caller should tally the identifying characteristics of an ongoing historical data stream. This could be error codes or connecting hostnames in a network log, usernames in a chat conversation, or any other key-worthy feature of an ongoing stream of events. At any time, the most predominantly occurring keys can be determined and ranked.
With Data::Favorites, a process can discover in real-time which objects have been selected by a user the most often, or which objects have been most responsible for event traffic. The map can be culled occasionally, keeping only the most fresh objects, or only the highest counted objects. The data can be naturally decayed, leaving only the objects with overall strongest dispositions.
For some examples, this structure can track the top ten favorite visited websites, or chat partners, or document files, or network connections. These can be inferred by looking at those entities with the strongest dispositions, according to the way they are tallied over time. As a historically favorite entity is triggered more or less often, its ranking would raise or drop in the list, making room or pushing out other entities.
A Data::Favorites object can employ an application-defined stamp function (a coderef) to mark the tallying process. If no function is given, then timestamps are applied with the usual time()
built-in function.
AUTHOR
Ed Halley, <ed@halley.cc>
COPYRIGHT AND LICENSE
Copyright 2001-2003 by Ed Halley
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.