NAME

CGI::EventCountFile - Perl module that interfaces to a tab-delimited text file for storing date-bounded counts of occurances for multiple events, such as web page views.

DEPENDENCIES

Perl Version

5.004

Standard Modules

Fcntl

Nonstandard Modules

I<none>

SYNOPSIS

use CGI::EventCountFile;

MAIN: {
	$self->mail_me_and_reset_counts_if_new_day( "counts.txt" );

	$self->update_one_count_file( "counts.txt", 
		(map { "\$ENV{$_} = \"$ENV{$_}\"" } qw(
		REQUEST_METHOD SERVER_NAME SCRIPT_FILENAME
		HTTP_HOST SCRIPT_NAME SERVER_SOFTWARE HTTP_REFERER )
	) );
}

sub update_one_count_file {
	my ($self, $file_path, @keys_to_inc) = @_;

	push( @keys_to_inc, '__total__' );

	my $count_file = CGI::EventCountFile->new( $file_path, 1 );
	$count_file->open_and_lock( 1 ) or return( 0 );
	$count_file->read_all_records();

	foreach my $key (@keys_to_inc) {
		$key eq '' and $key = '__nil__';
		$count_file->key_increment( $key );
	}

	$count_file->write_all_records();
	$count_file->unlock_and_close();
}

sub mail_me_and_reset_counts_if_new_day {
	my ($self, $file_path) = @_;

	my $dcm_file = CGI::EventCountFile->new( $file_path, 1 );
	$dcm_file->open_and_lock( 1 ) or do {
		print "<!-- ".$dcm_file->is_error()." -->\n";
		return( undef );
	};
	$dcm_file->read_all_records();
	if( $dcm_file->key_was_incremented_today( '__total__' ) ) {
		$dcm_file->unlock_and_close();
		return( 1 );
	}
	$dcm_file->key_increment( '__total__' );
	$dcm_file->set_all_day_counts_to_zero();
	$dcm_file->write_all_records();
	$dcm_file->unlock_and_close();
	
	my @mail_body = ();
	push( @mail_body, "\n\ncontent of '$file_path':\n\n" );
	push( @mail_body, $dcm_file->get_sorted_file_content() );
	
	open(MAIL, "|/usr/lib/sendmail -t") or do {
		print "<!-- sendmail can't send daily usage info -->\n";
		return( undef );
	};
	print MAIL "To: site_owner\@their_host\n";
	print MAIL "From: spying anonymous <spy\@anonymous>\n";
	print MAIL "Subject: daily hit count update\n\n";
	print MAIL "@mail_body\n\n";
	close (MAIL);
}

DESCRIPTION

This Perl 5 object class provides an easy-to-use interface for a plain text file format that is capable of storing an unordered list of events. Each event is identified by a string and has 4 attributes: date of first and last occurances, count of all occurances between first and last, count of only today's occurances.

A common use for this class is to track web site usage. Usage events that can be counted include: which site pages were viewed, which external urls we redirect visitors to, which external urls have a link to us (that were used), which internal site pages had links that were clicked on to go to other pages, which web browsers the visitors are using, where the visitors are from, and miscellaneous environment details like GET vs POST vs HEAD requests. However, events can be anything at all that we want to keep counts of.

This class is designed to facilitate ease of compiling and sorting count information for being e-mailed to the site owner once per day for backup/report purposes.

All event names have control characters (ascii 0 thru 31) removed prior to storage, so they don't interfere with file parsing; no escaping is done to preserve binary values as it is assumed they won't be used. Event names can be any length.

Dates are all stored in ISO 8601 format ("1994-02-03 14:15:29") with precision to the second, and dates are all in Universal Coordinated Time (UTC), aka Greenwich Mean Time (GMT). It is assumed that any dates provided using key_store are in UTC and formatted as ISO 8601 (six numbers in descending order of importance). That format allows for dates to be easily string-sorted without parsing. If you want to display in another time zone, you must do the conversion externally.

FILE FORMAT EXAMPLE

/guestbook	2000-05-16 12:31:41 UTC	2000-05-30 11:36:55 UTC	16	0
/guestbook/sign	2000-05-16 20:37:25 UTC	2000-05-30 11:36:32 UTC	7	0
/links	2000-05-16 14:18:48 UTC	2000-05-30 18:02:12 UTC	14	0
/mailme	2000-05-16 14:17:57 UTC	2000-05-30 16:54:39 UTC	17	0
/myperl	2000-05-16 09:16:22 UTC	2000-05-31 17:54:12 UTC	103	3
/myperl/base/1	2000-05-29 08:07:51 UTC	2000-05-29 08:07:51 UTC	1	0
/myperl/eventcountfile/1	2000-05-29 23:54:38 UTC	2000-05-29 23:54:38 UTC	1	0
/myperl/guestbook/1	2000-05-17 13:17:59 UTC	2000-05-29 08:40:39 UTC	3	0
/myperl/hashofarrays/1	2000-05-16 11:35:40 UTC	2000-05-30 20:58:32 UTC	6	0
/myperl/htmlformmaker/1	2000-05-17 06:41:04 UTC	2000-05-17 06:49:05 UTC	2	0
/myperl/htmltagmaker/1	2000-05-16 18:18:54 UTC	2000-05-31 17:05:23 UTC	4	1
/myperl/mailme/1	2000-05-16 11:36:08 UTC	2000-05-29 08:35:09 UTC	2	0
/myperl/methodparamparser/1	2000-05-16 15:31:58 UTC	2000-05-18 04:47:10 UTC	2	0
/myperl/segtextdoc/1	2000-05-18 03:11:53 UTC	2000-05-18 03:11:53 UTC	1	0
/myperl/sequentialfile/1	2000-05-16 15:30:54 UTC	2000-05-29 08:08:29 UTC	3	0
/myperl/static/1	2000-05-16 12:31:07 UTC	2000-05-16 15:47:29 UTC	2	0
/myperl/webpagecontent/1	2000-05-29 22:48:30 UTC	2000-05-30 11:11:16 UTC	2	0
/myperl/websiteglobals/1	2000-05-16 15:33:02 UTC	2000-05-29 18:57:29 UTC	5	0
/myperl/websitemanager/1	2000-05-16 17:37:05 UTC	2000-05-29 22:46:04 UTC	7	0
/mysites	2000-05-15 22:58:30 UTC	2000-05-31 01:40:52 UTC	78	1
/resume	2000-05-15 23:26:23 UTC	2000-05-30 16:52:11 UTC	57	0
__nil__	2000-05-15 07:57:37 UTC	2000-05-31 17:59:02 UTC	201	5
__total__	2000-05-15 07:57:37 UTC	2000-05-31 17:59:02 UTC	720	11
external	2000-05-15 22:59:16 UTC	2000-05-31 01:41:03 UTC	186	1

SYNTAX

This class does not export any functions or methods, so you need to call them using indirect notation. This means using Class->function() for functions and $object->method() for methods.

Objects of this class always store the filehandle they are working with as an internal property. However, you have a choice as to whether it creates the filehandle or whether you pass it an existing one. Likewise, you can retrieve the filehandle in question for your own manipulation, irregardless of how this class object got it in the first place.

Objects of this class always read the entire file into memory at once and do any manipulations of it there, then write it all back at once if we want to save updates. This approach makes fewer system calls and should be much faster. The objects store the file data internally, so once the file is read in we use the object's accessor methods to retrieve or manipulate data.

When saving changes, this class always truncates the file to ensure that if the new data will be shorter than what was there last time, such as when deleting records, so that none of the old data survives to cause corruption on the next read. While under ideal circumstances the truncation could be done either before or after a write, this class will always do it before, so that in the event of a failure part way through we don't have old data mixed with the new. I may add the ability to change this behaviour in later revisions of the class.

FUNCTIONS AND METHODS

new([ FILE[, CREAT[, PERMS]] ])

This function creates a new CGI::EventCountFile object and returns it. The first optional parameter, FILE, can be either a filehandle (GLOB ref) or a scalar. If it is a filehandle, then the "file handle" property is set to it, and all other parameters are ignored. If it is a scalar, then the "file path" property is set to it. The second optional parameter sets the "create if nonexistant" property, and the third optional parameter sets the "access permissions" property. See the accessors for these properties to see what they do.

initialize([ FILE[, CREAT[, PERMS]] ])

This method is used by new() to set the initial properties of an object, except when the new object is a clone. Calling it yourself will clear the existing properties and set new ones according to the optional parameters, which are the same as those to new(). Nothing is returned.

clone([ CLONE ])

This method initializes a new object to have all of the same properties of the current object and returns it. This new object can be provided in the optional argument CLONE (if CLONE is an object of the same class as the current object); otherwise, a brand new object of the current class is used. Only object properties recognized by CGI::EventCountFile are set in the clone; other properties are not changed.

Note that the internally stored filehandle (glob ref) is duplicated using an ordinary scalar copy, so I do not know whether the clone points to the same actual filehandle as the original or a different one.

filehandle([ VALUE ])

This method is an accessor for the "filehandle" property, which it returns. If VALUE is defined, this property is set to it. This filehandle is what this class is providing an interface to. Filehandles are expected to be passed as a GLOB reference, such as "\*FH".

file_path([ VALUE ])

This method is an accessor for the "file path" scalar property, which it returns. If VALUE is defined, this property is set to it. If this module is opening a file itself, it will use this property to determine where the file is located. This module is file-system agnostic, and will pass this "file path" to the open() function as-is. This means that if you provide only a file name and not a full path, the file must be in the current working directory. Do not provide any meta characters like "<" or ">>" in the file name, as we don't use them. This property is "" by default.

create_if_nonex([ VALUE ])

This method is an accessor for the "create if nonexistant" boolean/scalar property, which it returns. If VALUE is defined, this property is set to it. When this module has to open a file, and the file doesn't exist, then it will create the file if this property is true, and return a fatal error otherwise. This property is false by default.

access_perms([ VALUE ])

This method is an accessor for the "access permissions" octal/scalar property, which it returns. If VALUE is defined, this property is set to it. If this module creates a new file due to the "create if nonexistant" property being true, then this property determines which access permissions the new file has. The property is "0666" (everyone can read and write) by default.

is_error()

This method returns a string specifying the file-system error that just occurred, if any, and the undefined value if the last file-system operation succeeded. This string includes the operation attempted, which is one of ['open', 'close', 'lock', 'unlock', 'seek start', 'seek end', 'read from', 'write to'], as well as the file-system name of our file (if we opened it) and the system error string from $!, but has no linebreaks. The property is undefined by default.

open_and_lock([ RDWR[, PATH[, CREAT[, PERMS]]] ])

This method opens a file which is associated with the objects "file handle" property, and gains an access lock on it. The first optional argument, RDWR, is a boolean/scalar which specifies how we will be using the file. If it is true then we are opening the file in read-and-write mode and use an exclusive lock. If it is false then we are opening the file in read-only mode and use a shared lock. The second optional parameter, PATH, will override the "file path" property if defined, but the property isn't changed. Likewise the properties CREAT and PERMS will override the properties "create if nonexistant" and "access permissions" if defined. This method returns 1 on success and undef on failure. Presumably the file pointer is at byte zero now, but we don't do any seeking to make sure.

unlock_and_close()

This method releases the access lock on the file that is associated with the objects "file handle" property, and closes it. This method returns 1 on success and undef on failure. As of Perl 5.004, which this module requires, the flock function will flush buffered output prior to unlocking.

read_all_records()

This method reads all of the records from this object's "file handle", and stores them internally. This method returns 1 on success, even if the end-of-file is reached before we find any records. It returns undef on a file-system error, even if some records were read first.

write_all_records()

This method writes all of this object's internally stored records to its "file handle". The file is truncated at zero prior to writing them. This method returns 1 on success, even if there are no records to write. It returns undef on a file-system error, even if some of the records were written first.

key_exists( KEY )

This method returns true if KEY matches an existing internally stored record.

key_fetch( KEY )

This method returns a list of the attributes that the internally stored record matched by KEY has, or an empty list if KEY doesn't match anything. The attributes are: 1. the event key string; 2. date and time of the event's first occurance; 3. date and time of the event's last occurance; 4. total count of occurances between first and last; 5. count of only today's occurances.

key_store( KEY, FIRST, LAST, COUNT, TODAY )

This method adds a new internally stored event record to this object which KEY matches, and if a matching record already exists then it is overwritten. The remaining method parameters are assigned to this record as properties: FIRST is the date and time of the even'ts first occurance, LAST is the date and time of the event's last occurance, COUNT is the total count of occurances between FIRST and LAST, TODAY is the count of only today's occurances. FIRST and LAST are cleaned up to conform with ISO 8601 format before insertion, and either is given today's date if it is undefined. COUNT and TODAY is set to zero if undefined. This method returns the updated attribute list for KEY.

key_increment( KEY )

This method increments the counters by 1 for the internally stored event record that is matched by KEY, and if a record didn't previously exist, a new one is created with counts of 1. The record's "last occurance" date is also set to today. If the record was just created or its previous total count property was zero, then the record's "first occurance" date is also set to today. This method returns the updated attribute list for KEY.

key_delete( KEY )

This method deletes any existing internally stored event record that is matched by KEY, and returns its attribute list.

key_accumulate( DEST_KEY, SOURCE_KEYS )

This method is a utility designed to combine the counts from two or more event records during any time after they were started. One use for it is in the event that several keys were used when one was meant to be used, such as upper or lower cased versions of the same key. This method will combine the attributes for the related keys together that takes into account the earliest and latest dates among them, and accumulating the counts as appropriate. The first parameter, DEST_KEY, is the identifier for the internally stored record that will be the accumulator; if it already has values then they will be considered in the total. The second parameter, SOURCE_KEYS, is a list of identifiers for other internally stored records that will be added to the accumulating record. The other records are not deleted afterwards, so that will have to be done afterwards if desired. Another use for this method is to create "summary records" which show a total account for a group of more detailed records; in this case, the record is unlikely to exist already.

delete_all_keys()

This method deletes all the internally stored event records. A subsequent call to write_all_records() would then clear the file.

key_was_incremented_today( KEY )

This method inspects the internally stored event record that KEY matches and compares its "last occurance" date to today's date. If the record exists and its day is the same then this method returns true; otherwise it returns false. This method is intended to be used as a timer which rings once every 24 hours, or at the first count file update performed after midnight on any given day. For it to work properly, KEY must be incremented right afterwards.

set_all_day_counts_to_zero()

This method iterates through all of the internally stored records and sets their "count of only today's occurances" properties to zero. It doesn't do anything else. This method is intended to be called immediately following a false value being returned by the key_was_incremented_today() method, and prior to any keys being incremented during this update.

get_file_content()

This method returns a scalar containing all of the internally stored file records, formatted as they would be stored in the file. The records are delimited by line breaks and record fields are delimited by tabs.

get_sorted_file_content()

This method returns the same thing as get_file_content() except that the records are sorted asciibetically (by key).

today_date_utc()

This method returns an ISO 8601 formatted string containing the current Universal Coordinated Time with precision to the second. The returned string has 'UTC' at the end to make its time zone easy to identify when it is embedded in other text.

AUTHOR

Copyright (c) 1999-2000, Darren R. Duncan. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. However, I do request that this copyright information remain attached to the file. If you modify this module and redistribute a changed version then please attach a note listing the modifications.

I am always interested in knowing how my work helps others, so if you put this module to use in any of your own code then please send me the URL. Also, if you make modifications to the module because it doesn't work the way you need, please send me a copy so that I can roll desirable changes into the main release.

Address comments, suggestions, and bug reports to perl@DarrenDuncan.net.

BUGS

I have tested this module on Digital UNIX and Linux with no problems.

Both Windows 95/98 and Mac OS 7-9 don't implement the flock function, which this module uses automatically during opening and closing.

Perl for Mac OS 9 and earlier seems to have problems with sysread, which manifest themselves later as a "bad file descriptor" error when writing to an open file. Using plain "open" seems to fix the problem, but that doesn't give me the flexability to create nonexistant files on demand.

SEE ALSO

perl(1), Fcntl.