=head1 NAME

CGI::SequentialFile - Perl module that interfaces to a common text file format
which stores records as named and url-escaped key=value pairs.

=cut

######################################################################

package CGI::SequentialFile;
require 5.004;

# Copyright (c) 1999-2000, Darren R. Duncan. All rights reserved. This module is
# free software; you can redistribute it and/or modify it under the same terms as
# Perl itself.  However, I do request that this copyright information remain
# attached to the file.  If you modify this module and redistribute a changed
# version then please attach a note listing the modifications.

use strict;
use vars qw($VERSION);
$VERSION = '1.0';

######################################################################

=head1 DEPENDENCIES

=head2 Perl Version

	5.004

=head2 Standard Modules

	Fcntl

=head2 Nonstandard Modules

	CGI::HashOfArrays

=cut

######################################################################

use Fcntl qw(:DEFAULT :flock);
use CGI::HashOfArrays;

######################################################################

=head1 SYNOPSIS

	use CGI::SequentialFile;
	
	my $create_nonexistent = 1;
	my $case_insensitive = 1;
	
	my $field_defin_file = CGI::SequentialFile->new( "GB_Questions.txt" );
	my $message_file = CGI::SequentialFile->new( 
		"GB_Messages.txt", $create_nonexistent );
	
	my $query_string = '';
	if( $ENV{'REQUEST_METHOD'} =~ /^(GET|HEAD)$/ ) {
		$query_string = $ENV{'QUERY_STRING'};
	} else {
		read( STDIN, $query_string, $ENV{'CONTENT_LENGTH'} );
	}
	my $user_input = CGI::HashOfArrays->new( $case_insensitive, $query_string );
	$message_file->append_new_records( $user_input ) or 
		die "Error saving new GuestBook message: ".$message_file->is_error()."\n";
	
	my @field_list = $field_defin_file->fetch_all_records( $case_insensitive );
	if( my $err_msg = $field_defin_file->is_error() ) {
		die "Error determining GuestBook questions: $err_msg\n";
	}
	my @message_list = $message_file->fetch_all_records( $case_insensitive );
	if( my $err_msg = $message_file->is_error() ) {
		die "Error reading existing GuestBook messages: $err_msg\n";
	}
	
	print "All GuestBook Messages:\n";
	foreach my $message (@message_list) {
		print "\n";
		foreach my $field (@field_list) {
			my $field_name = $field->fetch_value( 'name' );
			my $title = $field->fetch_value( 'title' );
			my @inputs = $message->fetch( $field_name );
			print "Question: '$title'\n";
			print "Answers: '".join( "','", @inputs )."'\n";
		}
	}

=head1 DESCRIPTION

This Perl 5 object class provides an easy-to-use interface for a plain text file
format that is capable of storing an ordered list of variable-length records
where the fields of each record are stored in name=value pairs, one field value
per line.

Each record can have different fields from the others, and each field can have
either one or several values.  In the latter case, the field name is repeated for
each value.  Records are delimited by lines that contain only a "=" and are
otherwise empty.  The order of individual fields in the file doesn't matter, but
the order of parts of multivalued fields does; this order is preserved.  

All field names and values are url-escaped, so we are capable of storing binary
data without corrupting it.

=head1 FILE FORMAT EXAMPLE

	=
	name=name
	type=textfield
	visible_title=What%27s+your+name%3f
	=
	default=eenie
	default=minie
	name=words
	type=checkbox_group
	values=eenie
	values=meenie
	values=minie
	values=moe
	visible_title=What%27s+the+combination%3f
	=
	name=color
	type=popup_menu
	values=red
	values=green
	values=blue
	values=chartreuse
	visible_title=What%27s+your+favorite+colour%3f
	=
	type=submit

=cut

######################################################################

# Names of properties for objects of this class are declared here:
my $KEY_FILEHANDLE = 'filehandle';  # stores the filehandle
my $KEY_FILE_PATH  = 'file_path';   # external name of this file
my $KEY_CREAT_NNX  = 'creat_nnx';   # create file if nonexistant
my $KEY_ACC_PERMS  = 'acc_perms';   # new files have these permissions
my $KEY_CASE_INSE  = 'case_inse';   # are HoA keys case insensitive?
my $KEY_USE_EMPTY  = 'use_empty';   # do we process empty records?
my $KEY_IS_ERROR   = 'is_error';    # holds error string, if any

# Constant values used in this class go here:
my $DELIM_RECORDS = "\n=\n";     # this is standard
my $DELIM_FIELDS = "\n";  # this is a standard

######################################################################

=head1 SYNTAX

This class does not export any functions or methods, so you need to call them
using indirect notation.  This means using B<Class-E<gt>function()> for functions and
B<$object-E<gt>method()> for methods.

Record data taken from a file is returned as a list of CGI::HashOfArrays
(HoA) objects, one object for each record.  The keys in the HoA are the field
names, and the list of values associated with each HoA key are the values of the
field.  Record data to be stored in a file must likewise be provided as a list of
HoA objects, or a list of HASH refs.  HoAs are used because they simplify the
manipulation of hashes whose keys may have one or several values (see the HoA
documentation for details of their use).

Objects of this class always store the filehandle they are working with as an
internal property.  However, you have a choice as to whether it creates the
filehandle or whether you pass it an existing one.  Likewise, you can retrieve
the filehandle in question for your own manipulation, irregardless of how this
class object got it in the first place.

=head1 FUNCTIONS AND METHODS

=head2 new([ FILE[, CREAT[, PERMS]] ])

This function creates a new CGI::SequentialFile object and returns it.  The
first optional parameter, FILE, can be either a filehandle (GLOB ref) or a
scalar.  If it is a filehandle, then the "file handle" property is set to it, and
all other parameters are ignored.  If it is a scalar, then the "file path"
property is set to it.  The second optional parameter sets the "create if
nonexistant" property, and the third optional parameter sets the "access
permissions" property.  See the accessors for these properties to see what they
do.

=cut

######################################################################

sub new {
	my $class = shift( @_ );
	my $self = {};
	bless( $self, ref($class) || $class );
	$self->initialize( @_ );
	return( $self );
}

######################################################################

=head2 initialize([ FILE[, CREAT[, PERMS]] ])

This method is used by B<new()> to set the initial properties of an object,
except when the new object is a clone.  Calling it yourself will clear the
existing properties and set new ones according to the optional parameters, which
are the same as those to new().  Nothing is returned.

=cut

######################################################################

sub initialize {
	my $self = shift( @_ );
	%{$self} = ();
	if( ref( $_[0] ) eq 'GLOB' ) {
		$self->{$KEY_FILEHANDLE} = shift( @_ );
	} else {
		$self->{$KEY_FILEHANDLE} = \*FH;
		$self->{$KEY_FILE_PATH} = shift( @_ );
		$self->{$KEY_CREAT_NNX} = shift( @_ );
		$self->{$KEY_ACC_PERMS} = shift( @_ );
	}
}

######################################################################

=head2 clone()

This method creates a new CGI::SequentialFile object, which is a duplicate of
this one in every respect, and returns it.  But the filehandle itself isn't
duplicated, rather we now might have two references to the same one.  (I'm not
yet sure when or not this is the case.)

=cut

######################################################################

sub clone {
	my $self = shift( @_ );
	my $clone = {};
	bless( $clone, ref($self) );
	%{$clone} = %{$self};  # only does single-level copy
	return( $clone );
}

######################################################################

=head2 filehandle([ VALUE ])

This method is an accessor for the "filehandle" property, which it returns.  If
VALUE is defined, this property is set to it.  This filehandle is what this class
is providing an interface to.  Filehandles are expected to be passed as a GLOB
reference, such as "\*FH".

=cut

######################################################################

sub filehandle {
	my $self = shift( @_ );
	if( ref( my $new_value = shift( @_ ) ) eq 'GLOB' ) {
		$self->{$KEY_FILEHANDLE} = $new_value;
	}
	return( $self->{$KEY_FILEHANDLE} );
}

######################################################################

=head2 file_path([ VALUE ])

This method is an accessor for the "file path" scalar property, which it returns.
 If VALUE is defined, this property is set to it.  If this module is opening a
file itself, it will use this property to determine where the file is located. 
This module is file-system agnostic, and will pass this "file path" to the open()
function as-is.  This means that if you provide only a file name and not a full
path, the file must be in the current working directory.  Do not provide any meta
characters like "<" or ">>" in the file name, as we don't use them.  This
property is "" by default.

=cut

######################################################################

sub file_path {
	my $self = shift( @_ );
	if( my $new_value = shift( @_ ) ) {
		$self->{$KEY_FILE_PATH} = $new_value;
	}
	return( $self->{$KEY_FILE_PATH} );
}

######################################################################

=head2 create_if_nonex([ VALUE ])

This method is an accessor for the "create if nonexistant" boolean/scalar
property, which it returns.  If VALUE is defined, this property is set to it. 
When this module has to open a file, and the file doesn't exist, then it will
create the file if this property is true, and return a fatal error otherwise. 
This property is false by default.

=cut

######################################################################

sub create_if_nonex {
	my $self = shift( @_ );
	if( my $new_value = shift( @_ ) ) {
		$self->{$KEY_CREAT_NNX} = $new_value;
	}
	return( $self->{$KEY_CREAT_NNX} );
}

######################################################################

=head2 access_perms([ VALUE ])

This method is an accessor for the "access permissions" octal/scalar property,
which it returns.  If VALUE is defined, this property is set to it.  If this
module creates a new file due to the "create if nonexistant" property being true,
then this property determines which access permissions the new file has.  The
property is "0666" (everyone can read and write) by default.

=cut

######################################################################

sub access_perms {
	my $self = shift( @_ );
	if( my $new_value = shift( @_ ) ) {
		$self->{$KEY_ACC_PERMS} = $new_value;
	}
	return( $self->{$KEY_ACC_PERMS} );
}

######################################################################

=head2 ignores_case([ VALUE ])

This method is an accessor for the "ignores case" boolean/scalar property, which
it returns.  If VALUE is defined, this property is set to it.  This property is
only used when reading records from a file, and is used during initialization of
the HoA objects that read records are returned in.  Any HoAs with this property
set to true will lowercase any keys inserted into them, and they stay that way on
output.  This means that if a record read from a file has fields with names that
differ only by their case, they are treated as the same field.  The property is
false by default.

=cut

######################################################################

sub ignores_case {
	my $self = shift( @_ );
	if( my $new_value = shift( @_ ) ) {
		$self->{$KEY_CASE_INSE} = $new_value;
	}
	return( $self->{$KEY_CASE_INSE} );
}

######################################################################

=head2 uses_empty_records([ VALUE ])

This method is an accessor for the "use empty" boolean/scalar property, which it
returns.  If VALUE is defined, this property is set to it.  If this property is
true, this module will return a record for every record delimiter read,
irregardless of whether the record contained any fields.  If this property is
false, then consecutive record delimiters are disregarded until a record that has
fields is encountered.  On writing, a false value for this property means that we
disregard any records that don't have any fields, and a true value means we write
them anyway, resulting in multiple consecutive record delimiters.  This property
is false by default, and in that state we are guaranteed that reads only return
records with fields in them, and writes are likewise.

=cut

######################################################################

sub uses_empty_records {
	my $self = shift( @_ );
	if( my $new_value = shift( @_ ) ) {
		$self->{$KEY_USE_EMPTY} = $new_value;
	}
	return( $self->{$KEY_USE_EMPTY} );
}

######################################################################

=head2 is_error()

This method returns a string specifying the file-system error that just occurred,
if any, and the undefined value if the last file-system operation succeeded. 
This string includes the operation attempted, which is one of ['open', 'close',
'lock', 'unlock', 'seek start', 'seek end', 'read from', 'write to'], as well as
the file-system name of our file (if we opened it) and the system error string
from $!, but has no linebreaks.  The property is undefined by default.

=cut

######################################################################

sub is_error {
	my $self = shift( @_ );
	return( $self->{$KEY_IS_ERROR} );
}

######################################################################

=head2 open_and_lock([ RDWR[, PATH[, CREAT[, PERMS]]] ])

This method opens a file which is associated with the objects "file handle"
property, and gains an access lock on it.  The first optional argument, RDWR, is
a boolean/scalar which specifies how we will be using the file.  If it is true
then we are opening the file in read-and-write mode and use an exclusive lock. 
If it is false then we are opening the file in read-only mode and use a shared
lock.  The second optional parameter, PATH, will override the "file path"
property if defined, but the property isn't changed.  Likewise the properties
CREAT and PERMS will override the properties "create if nonexistant" and "access
permissions" if defined.  This method returns 1 on success and undef on failure. 
Presumably the file pointer is at byte zero now, but we don't do any seeking to
make sure.

=cut

######################################################################

sub open_and_lock {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $read_and_write = shift( @_ );	
	my $file_path = shift( @_ );
	my $creat_nnx = shift( @_ );
	my $perms = shift( @_ );
	
	defined( $file_path ) or $file_path = $self->{$KEY_FILE_PATH};
	defined( $creat_nnx ) or $creat_nnx = $self->{$KEY_CREAT_NNX};
	defined( $perms ) or $perms = $self->{$KEY_ACC_PERMS};

	my $flags = 
		$read_and_write && $creat_nnx ? O_RDWR|O_CREAT :
		$read_and_write ? O_RDWR : 
		$creat_nnx ? O_RDONLY|O_CREAT : O_RDONLY;
	defined( $perms ) or $perms = 0666;

	$self->{$KEY_IS_ERROR} = undef;

	sysopen( $fh, $file_path, $flags, $perms ) or do {
		$self->_make_filesystem_error( "open" );
		return( undef );
	};

	flock( $fh, $read_and_write ? LOCK_EX : LOCK_SH ) or do {
		$self->_make_filesystem_error( "lock" );
		return( undef );
	};

	return( 1 );
}

######################################################################

=head2 unlock_and_close()

This method releases the access lock on the file that is associated with the
objects "file handle" property, and closes it.  This method returns 1 on success
and undef on failure.  As of Perl 5.004, which this module requires, the flock
function will flush buffered output prior to unlocking.

=cut

######################################################################

sub unlock_and_close {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	
	$self->{$KEY_IS_ERROR} = undef;

	flock( $fh, LOCK_UN ) or do {
		$self->_make_filesystem_error( "unlock" );
		return( undef );
	};

	close( $fh ) or do {
		$self->_make_filesystem_error( "close" );
		return( undef );
	};

	return( 1 );
}

######################################################################

=head2 read_records([ CASE[, MAX[, EMPTY]] ])

This method reads records from this object's "file handle", and returns them. 
The second optional scalar argument specifies the maximum number of records to
read.  If that argument is undefined or less than 1, then all records are read
until the end-of-file is reached.  The first and third optional arguments, CASE
and EMPTY, will override the object properties "ignores case" and "use empty" if
defined.  This method returns an ARRAY ref containing the new records (as HoAs)
on success, even if the end-of-file is reached before we find any records.  It
returns undef on a file-system error, even if some records were read first.

=cut

######################################################################

sub read_records {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $case_inse = shift( @_ );
	my $max_rec_num = shift( @_ );  # if <= 0, read all records
	my $use_empty = shift( @_ );
	
	defined( $case_inse ) or $case_inse = $self->{$KEY_CASE_INSE};
	defined( $use_empty ) or $use_empty = $self->{$KEY_USE_EMPTY};
	
	$self->{$KEY_IS_ERROR} = undef;

	my @record_list = ();
	my $remaining_rec_count = ($max_rec_num <= 0) ? -1 : $max_rec_num;

	local $/ = $DELIM_RECORDS;

	GET_ANOTHER_REC: {
		eof( $fh ) and return( \@record_list );
	
		defined( my $record_str = <$fh> ) or do {
			$self->_make_filesystem_error( "read record from" );
			return( undef );
		};
	
		my $record = CGI::HashOfArrays->new( 
			$case_inse, $record_str, $DELIM_FIELDS );
		
		$record->keys_count() or $use_empty or redo GET_ANOTHER_REC;

		push( @record_list, $record );

		--$remaining_rec_count != 0 and redo GET_ANOTHER_REC;

		return( \@record_list );
	}
}

######################################################################

=head2 write_records( LIST[, EMPTY] )

This method writes records to this object's "file handle".  The first argument,
LIST, is an ARRAY ref containing the records (as HoAs or HASH refs) to be
written, or it is a single record to be written.  If any array elements aren't
HoAs or HASH refs, they are disregarded.  The second, optional argument, EMPTY,
will override the object's "use empty" property if defined.  This method returns
1 on success, even if there are no records to write.  It returns undef on a
file-system error, even if some of the records were written first.

=cut

######################################################################

sub write_records {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $ra_record_list = shift( @_ );
	my $use_empty = shift( @_ );
	
	ref( $ra_record_list ) eq 'ARRAY' or $ra_record_list = [];
	defined( $use_empty ) or $use_empty = $self->{$KEY_USE_EMPTY};
	
	$self->{$KEY_IS_ERROR} = undef;

	local $\ = undef;

	foreach my $record (@{$ra_record_list}) {
		ref( $record ) eq 'HASH' and $record = 
			CGI::HashOfArrays->new( 0, $record );
		ref( $record ) eq "CGI::HashOfArrays" or next;
		
		!$use_empty and !$record->keys_count() and next;

		my $record_str = $record->to_url_encoded_string( $DELIM_FIELDS );

		print $fh "$DELIM_RECORDS$record_str" or do {
			$self->_make_filesystem_error( "write record to" );
			return( undef );
		};
	}
	
	return( 1 );
}

######################################################################

=head2 fetch_all_records([ CASE ])

This method will return a list containing all the records from a file, which may
be empty if the file is empty.  The list is returned as a single ARRAY ref if
this method is called in scalar context.  This method returns undef on failure. 
It assumes that the file is not already open.

=cut

######################################################################

sub fetch_all_records {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $case_inse = shift( @_ );
	
	$self->{$KEY_IS_ERROR} = undef;

	$self->open_and_lock( 0 ) or return( undef );

	seek( $fh, 0, 0 ) or do {
		$self->_make_filesystem_error( "seek start of" );
		return( undef );
	};

	my $ra_record_list = $self->read_records( $case_inse, -1 ) 
		or return( undef );

	$self->unlock_and_close() or return( undef );

	return( wantarray ? @{$ra_record_list} : $ra_record_list );
}

######################################################################

=head2 append_new_records( LIST )

This method will take a list of records, and append them to a file.  The argument
LIST can either be an ARRAY ref or an actual list.  This method returns 1 on
success and undef on failure.  It assumes that the file is not already open.

=cut

######################################################################

sub append_new_records {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $ra_record_list = (ref( $_[0] ) eq 'ARRAY') ? $_[0] : [@_];
	
	$self->{$KEY_IS_ERROR} = undef;

	$self->open_and_lock( 1 ) or return( undef );

	seek( $fh, 0, 2 ) or do {
		$self->_make_filesystem_error( "seek end of" );
		return( undef );
	};

	$self->write_records( $ra_record_list ) or return( undef );

	$self->unlock_and_close() or return( undef );

	return( 1 );
}

######################################################################

=head2 replace_all_records( LIST )

This method will take a list of records, and overwrite a file with them.  The
argument LIST can either be an ARRAY ref or an actual list.  The file is
truncated before writing the new records.  An easy way to simply delete all
records in a file is to call this method with an empty list.  This method returns
1 on success and undef on failure.  It assumes that the file is not already open.

=cut

######################################################################

sub replace_all_records {
	my $self = shift( @_ );
	my $fh = $self->{$KEY_FILEHANDLE};
	my $ra_record_list = (ref( $_[0] ) eq 'ARRAY') ? $_[0] : [@_];
	
	$self->{$KEY_IS_ERROR} = undef;

	$self->open_and_lock( 1 ) or return( undef );

	seek( $fh, 0, 0 ) or do {
		$self->_make_filesystem_error( "seek start of" );
		return( undef );
	};

	truncate( $fh, 0 ) or do {
		$self->_make_filesystem_error( 'truncate to start of' );
		return( undef );
	};

	$self->write_records( $ra_record_list ) or return( undef );

	$self->unlock_and_close() or return( undef );

	return( 1 );
}

######################################################################

sub _make_filesystem_error {
	my $self = shift( @_ );
	my $unique_part = shift( @_ );
	return( $self->{$KEY_IS_ERROR} = 
		"can't $unique_part data file '$self->{$KEY_FILE_PATH}': $!" );
}

######################################################################

1;
__END__

=head1 DEVELOPMENT HISTORY

The file format that this module handles became known to me during a programming
exercise where I was given an example file containing usernames and passwords and
had to parse it.  I was informed at the time that this file format was common.  

This module was created for my own use, as I stored html form descriptions and
user input from my CGI scripts in the file format.  Through independent
development, my module gained the ability to store binary data safely through
url-encoding (preserving white-space formatting among other benefits), and could
store everything from multi-valued fields.

When I decided to take my modules public, and develop them further before doing
so, I first looked upon CPAN to see if someone else had already done what this
module does, and none had, surprisingly enough.  Maybe the format was too simple
to make a module for, but I thought it worthwhile.

=head1 COMPATABILITY WITH OTHER MODULES

It turns out that this file format is identical to that used by the Whitehead
Genome Center's data exchange format, and can be manipulated and even databased
using Boulderio utilities.  See
"http://www.genome.wi.mit.edu/genome_software/other/boulder.html" for further
details.  

Boulderio didn't turn up in any CPAN search, but I found out about it from
Lincoln D. Stein's documentation for CGI.pm, which itself uses a file format
identical to this module, when saving its state.

=head1 AUTHOR

Copyright (c) 1999-2000, Darren R. Duncan. All rights reserved. This module is
free software; you can redistribute it and/or modify it under the same terms as
Perl itself.  However, I do request that this copyright information remain
attached to the file.  If you modify this module and redistribute a changed
version then please attach a note listing the modifications.

I am always interested in knowing how my work helps others, so if you put this
module to use in any of your own code then please send me the URL.  Also, if you
make modifications to the module because it doesn't work the way you need, please
send me a copy so that I can roll desirable changes into the main release.

Address comments, suggestions, and bug reports to B<perl@DarrenDuncan.net>.

=head1 BUGS

I have tested this module on Digital UNIX and Linux with no problems.

However, MacPerl seems to have problems with sysread, which manifest themselves
later as a "bad file descriptor" error when writing to an open file.  Using plain
"open" seems to fix the problem, but that doesn't give me the flexability to
create nonexistant files on demand.

Also, the Mac OS currently doesn't implement the flock function, which this
module uses automatically during opening and closing.  Mac OS X will change this,
but in the meantime the only ways to use this module on a Mac is to either
comment out the flock call or just call read_records() and write_records()
directly while opening and closing the file yourself.

I will note that MacPerl comes with a set of shared libraries that may correct 
these difficulties, or maybe they don't.  But I never installed them to find out.

=head1 SEE ALSO

perl(1), Boulder, CGI, CGI::HashOfArrays.

=cut