NAME
FastDB::Load - Load data to FastDB database
SYNOPSIS
use FastDB::Load;
my $obj = Load->new(
'Name' => 'Database name',
'Data store location' => 'Directory to store the data',
'Field separator string' => '|' ,
'Original field names' => [ Columns ],
'Indexed fields' => [ Indexed columns ],
'Extra virtual fields' => [ Extra columns ],
'Transform' => [
'Some column 1' => 'my $a = lc DATA; $a = reverse $a; return $a;',
'Some column 2' => 'lc DATA' ,
'Some column 3' => 'uc DATA;' ,
]
);
$load->load( 'a1' , 'a2', 'a3', ... );
$load->load( 'b1' , 'b2', 'c3', ... );
...
$obj->Write_statistics_at_the_end();
EXAMPLE
#!/usr/bin/perl
#
# Example of FastDB::Load . Extra virtual fields you can optional use:
#
# EXTRA_DAY 01 - 31
# EXTRA_DAY_NAME Sun - Sat
# EXTRA_MONTH_NAME Jan - Dec
# EXTRA_MONTH 01 - 12
# EXTRA_YEAR 1453
# EXTRA_HOUR 00 - 23
# EXTRA_MINUTE 00 - 59
# EXTRA_SECOND 00 - 59
# EXTRA_TIMESTAMP 20101201123957 ( YYYYMMDDhhmmss )
use FastDB::Load;
my $load = Load->new(
'Name' => 'Export cargo' ,
'Data store location' => '/work/FastDB test/db' ,
'Field separator string' => ',' ,
'Original field names' => [ 'COLOR', 'HEIGHT', 'WEIGHT', 'TYPE', 'ID', 'COUNTRY' ] ,
'Indexed fields' => [ 'WEIGHT' , 'EXTRA_YEAR' ] ,
'Extra virtual fields' => [ 'EXTRA_TIMESTAMP', 'EXTRA_YEAR', 'EXTRA_DAY_NAME' ] ,
'Transform' => [
'COLOR' => 'my $a = lc DATA; $a' ,
'TYPE' => 'uc DATA;' ,
'ID' => '"<id>DATA</id>"' ,
'COUNTRY' => 'uc DATA' ,
]
);
$load->load( 'Green' , 10, 1500, 'mech22', 'A100', 'New Zeland' );
$load->load( 'Brown' , 10, 1500, 'mech22', 'A100', 'India' );
$load->load( 'Green' , 11, 3500, 'mech23', 'B100', 'Australia' );
$load->load( 'Yellow', 7, 2500, 'mech21', 'C100', 'South Africa' );
$load->load( 'Red' , 14, 2500, 'mech21', 'D001', 'U.S. Montana' );
$load->load( 'Red' , 17, 5500, 'mech32', 'D101', 'U.S. Montana' );
$load->load( 'White' , 21, 700, 'snow02', 'E002', 'North Pole' );
$load->load( 'White' , 21, 700, 'snow02', 'E002', 'South Pole' );
# Optional write some short information about your load
# $load->Write_statistics_at_the_end();
# $load->Write_statistics_at_the_end( $SomeFile );
# $load->Write_statistics_at_the_end( "$load->{'Data store location'}/$load->{'Name'}.log" );
$load->Write_statistics_at_the_end();
DESCRIPTION
FastDB is a file based database. It is using directories to store the indexed columns. Also there is implemented deduplication to avoid storing the same data where it is possible. Your database and its schema will be created at first data load. After the first data load it is not possible to add or remove columns.
It is written at Pure perl, so it can run on all operating systems. It is designed to give answers as fast your disk and operating system is.
This module load your data to a FastDB database. At loading time you can edit your data of every column using generic Perl code defined at the property 'Transform'. Its column should have its own code. You can have transform to one or more columns. The special string DATA (or data) is replaced with the currect column value, at loading time 'Transform' is optional, do not use it if you do not want.
At the end of loading it is suggested to call the optional function 'Write_statistics_at_the_end' to write some short info to a file.
Functions
- my $load = Load->new( %hash );
-
Creates a new FastDB::Load object. %hash must have the keys
Data store location
The root directory that will hold your data
Name
The name of your database. This will also become a subdirectory of the Data store location
Field separator string
This is used internal to separated columns from each other. Can be more than one characters. You must select a string that there is no case to be found at your data
Extra virtual fields
At loading time you can optional load the following fields that do not exists at your data . Their values calculated at loading time. The values may change if your load continue for long time. The name of these fields and some sample values are
EXTRA_DAY 01 - 31 EXTRA_DAY_NAME Sun - Sat EXTRA_MONTH_NAME Jan - Dec EXTRA_MONTH 01 - 12 EXTRA_YEAR 2012 EXTRA_HOUR 00 - 23 EXTRA_MINUTE 00 - 59 EXTRA_SECOND 00 - 59 EXTRA_TIMESTAMP 20121201123957 ( YYYYMMDDhhmmss )
Original field names
An array reference of your column names. Do not include here again the Extra virtual fields The case is important. Field names must not contain the character |
Indexed fields
An array reference of the columns you want to index. You define any any original or extra field. Do not define more than you really need. These will become subdirectories.
Transform
An array reference with the data transformations . You can use this, to transfrom your data at loading time. You define the column name and some Perl code. Perl code is applied over column data, and FastDB is storing its returned value. The special string DATA is replaced at loading time with the current value. Every Transformation is applied only to its column. You can not use column names inside the Perl code. The order of 'Transform' is not important. Its syntax is
'SOME COLUMN 1' => 'Perl code do something with the "DATA"', 'SOME COLUMN 2' => 'ucfirst DATA', 'SOME COLUMN 3' => 'my $var = DATA ; blah blah blah ; $var', and so on
- $load->load( col1, col2, ... );
-
A list of data you want to store as a row. The fields order should be the same as the column names at Original field names
normally you will put this inside a loop that read and split lines from a file, socket or whatever.
- $load->Write_statistics_at_the_end( [SomefFile] );
-
Optional method. Writes to a file how many rows loaded and long it took. It takes as optional argument the file to write this info to. If you do not specify an file it will use the string "$load->{'Data store location'}/$load->{'Name'}.log"
$load->Write_statistics_at_the_end(); $load->Write_statistics_at_the_end( $SomeFile ); $load->Write_statistics_at_the_end( "$load->{'Data store location'}/$load->{'Name'}.log" );
NOTES
There is a case to have problem at microsoft windows when you have multiple indexes with long values because of the 255 characters NTFS max path limitation.
It is recommented to use a linux partition (or a mounted file) formatted with btrfs file system ( ext4 is also good but not as fast as btrfs). Ext3, Fat16 are not recommended.
INSTALL
Because this module is implemented with pure Perl it is enough to copy FastDB directory somewhere at your @INC or where your script is. For your convenient you can use the following commands to install/uninstall the module
Install: setup_module.pl –-install --module=FastDB
Uninstall: setup_module.pl –-uninstall --module=FastDB
AUTHORS
Author: gravitalsun@hotmail.com (George Mpouras)
COPYRIGHT
Copyright (c) 2011, George Mpouras, gravitalsun@hotmail.com All rights reserved.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 536:
Non-ASCII character seen before =encoding in '–-install'. Assuming CP1252