NAME
Tie::PagedArray - A tieable module for handling large arrays by paging
VERSION
Version 0.01
SYNOPSIS
tie my(@large_array), 'Tie::PagedArray';
tie my(@large_array), 'Tie::PagedArray', page_size => 100, paging_dir => '/tmp';
DESCRIPTION
When processing a large volumes of data a program may run out of memory. The operating system may impose a limit on the amount of memory a process can consume or the machine may simply lack the required amount of memory.
Tie::PagedArray supports large arrays by implementing paging and avoids running out of memory. The array is broken into pages and these pages are pushed to disk barring the page that is in use. Performance depends on the device chosen for persistence of pages.
This module uses Storable as its backend for serialization and deserialization. So the elements of the paged array can be any value or object. See documentation for Storable module to work with code refs.
When switching pages data from the currently active page is offloaded from the memory onto the page file if the page is marked dirty. This is followed by deserializing the page file of the page to which the switch is to be made.
An active page is marked dirty by an assignment of a value to any element in the page. To forcibly mark a page dirty assign an element in the page to itself!
$large_array[2000] = $large_array[2000];
The defaults are page_size => 2000
, paging_dir => "."
METHODS
tie
The tie
call lets you create a new Tie::PagedArray object.
tie my(@large_array), 'Tie::PagedArray';
tie my(@large_array), 'Tie::PagedArray', page_size => 100;
tie my(@large_array), 'Tie::PagedArray', page_size => 100, paging_dir => '/tmp';
Ties the array @large_array
to Tie::PagedArray
class.
page_size
is the size of a page. If page_size
is omitted then it defaults to 2000 elements. The default page size can be changed by setting the package variable ELEMS_PER_PAGE
. The change in default only affects future ties.
$Tie::PagedArray::ELEMS_PER_PAGE = 2000;
paging_dir
is a directory to store the page files. Choose a directory on a fast storage device. If omitted it defaults to the current working directory.
page_files
The page_files
method available on the tied object returns the names of the page files belonging to the array. This can be used to freeze the array and archive it along with its page files!
LIMITATIONS
1) foreach
loop must not be used on Tie::PagedArray
s because the array in foreach expands into an in-memory list. Instead, use iterative loops.
while(my($i) = each(@large_array)) {
# Do something with $large_array[$i]
}
OR
for(my $i = 0; $i < scalar(@large_array); $i++) {
# Do something with $large_array[$i]
}
2) When an update is made to an element's nested datastructure then the corresponding page is not marked dirty as it is difficult to track such updates.
Suppose page_size => 1
and hash refs are stored as elements in the array.
@car_parts = ({name => "wheel", count => 4}, {name => "lamp", count => 8});
Then an update to count will not mark the page dirty. When the page is later switched out the modification would be lost!
$car_parts[1]->{count} = 6;
The workaround is to assign the element to itself.
$car_parts[1] = $car_parts[1];
3) When an object is assigned to two elements in different pages they point to two independent objects.
Suppose page_size => 2
, then
my $wheel = {name => "wheel", count => 4};
@car_parts = ($wheel, $wheel, $wheel);
print($car_parts[0] == $car_parts[1] ? "Same object\n" : "Independent objects\n");
Same object
print($car_parts[0] == $car_parts[1] ? "Same object\n" : "Independent objects\n");
Independent objects
BUGS
None known.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Tie::PagedArray
AUTHOR
Kartik Bherin
LICENSE AND COPYRIGHT
Copyright (C) 2013 Kartik Bherin.