NAME

File::GetLineMaxLength - Get lines from a file, up to a maximum line length

SYNOPSIS

use File::GetLineMaxLength;

$FML = File::GetLineMaxLength->new(STDIN);

# Read lines, up to 1024 chars
while (my $Line = $FML->getline(1024, $Excess)) {
}

DESCRIPTION

While generally reading lines of data is easy in perl (eg <$Fh>), there's apparently no easy way to limit the read line to a maximum length (as in the C call fgets(char *s, int size, FILE *stream)). This can lead to potential DOS situations in your code where an attacker can send an arbitrarily large line and use up all your memory. Of course you can use things like BSD::Resource to stop your program using all memory, but that just kills off the process and gives you no more information about what was causing the problem.

This question was raised on perlmonks, and the general response seemed to be "roll your own using the read() call." http://www.perlmonks.org/index.pl?node_id=238980

This module basically does that, but makes it reusable, so you can wrap any handle and get line length limited IO.

IMPLEMENTATION

It basically creates an internal buffer, and uses read() to read up to 4096 bytes at a time, looking for the appropriate EOL marker. When found, it returns the line and leaves the remaining data in the internal buffer for the next call.

Because of this internal buffering, you should NOT mix calling getline() via this class and any other standard IO calls on the file handle you passed to new(), you'll get surprising results.

PERFORMANCE

The code tries to be pretty careful performance wise (single buffer, no copying, use index to find EOL), but because it's perl, a tight loop is still an order of magnitude slower.

For instance, just a loop reading a file with 10,000 50 char or so lines, 100 times:

read: 0.588507
glml read: 4.654946

However, if you do any work in the loop at all, that time difference becomes quite a bit less.

Same as above, but do @_ = split / / in the loop

read: 8.688189
glml read: 12.529909

So basically any "work" you do will probably easily swamp the read time

METHODS

new($Handle)

Wrap handle and return object which you can call getline($max_len) on.

Note: See above about not calling any other IO calls on the passed handle after you pass it to this new() call.

getline([ $max_length, $was_long_line ])

Get a line of data from the file handle, up to $max_length bytes long. If no $max_length passed, works just like standard perl <$fh>. If the $was_long_line variable is passed, it's set to 0 or 1 depending on whether the line was very long and has been truncated.

Note: Actually this might return up to $maxlength + length(EOL) chars as the EOL chars are not considered part of the line length. The current EOL chars for the file handle are gotten via $/ when you called new() above

SEE ALSO

PerlIO::via, IO::Handle

Latest news/details can also be found at:

http://cpan.robm.fastmail.fm/filegetlinemaxlength/

AUTHOR

Rob Mueller <cpan@robm.fastmail.fm>.

COPYRIGHT AND LICENSE

Copyright (C) 2004-2007 by FastMail IP Partners

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.