NAME
DBIx::TableLoader - Easily load a database table from a data set
VERSION
version 1.002
SYNOPSIS
my $dbh = DBI->connect(@connection_args);
DBIx::TableLoader->new(dbh => $dbh, data => $data)->load();
# interact with new database table full of data in $dbh
DESCRIPTION
This module tries to provide a fast and simple (but very configurable) interface for taking a set of data and loading it into a database table.
Common uses would be to take data from a file (like a CSV) and load it into a SQLite table. (For that specific case see DBIx::TableLoader::CSV.)
In most cases simply calling load()
is sufficient, but all methods are documented below for completeness.
METHODS
new
Create a new instance. Accepts a hash or hashref of options.
This module is very configurable but tries to use good defaults in the hopes that you won't need to configure too much in most cases.
Most likely needed options:
dbh
- A database handle as returned byDBI->connect()
data
- An arrayref of arrayrefs of data (which will be the input records)
See "OPTIONS" for the full list.
base_defaults
Returns a hashref of the options defined by the base class and their default values.
defaults
Returns a hashref of additional options defined by a subclass.
columns
my $columns = $loader->columns;
# [ ['column1', 'data type'], ['column two', 'data type'] ]
Returns an arrayref of the columns. Each element is an arrayref of column name and column data type.
column_names
my $column_names = $loader->column_names;
# ['column1', 'column two']
Returns an arrayref of the column names.
create
Executes a CREATE TABLE
SQL statement on the database handle.
create_prefix
Generates the opening of the CREATE TABLE
statement (everything before the column specifications).
Defaults to "CREATE $table_type TABLE $quoted_name ("
.
create_sql
Generates the SQL for the CREATE TABLE
statement by concatenating "create_prefix", the column definitions, and "create_suffix".
Can be overridden in the constructor.
create_suffix
Generates the closing of the CREATE TABLE
statement (everything after the column specifications).
Defaults to ")"
.
default_name
Returns the default (base) name for the table.
This is mostly for subclasses where a useful table name can be determined from the input (like a filename). In this module it defaults to 'data'
.
This gets concatenated together with "name_prefix" and "name_suffix" in "name".
default_column_type
Columns that have not been given an explicit data type will be defined using the default_column_type
.
You can pass a value explicitly to the constructor, or it will try to determine an appropriate (string) type based on the database driver (using "default_sql_data_type").
If all else fails it will default to text
(which works for SQLite, PostgreSQL, MySQL, and some others).
default_sql_data_type
Passed to "type_info" in DBI to query the database driver for an appropriate default column type.
Defaults to DBI::SQL_LONGVARCHAR
.
determine_column_types
This method goes through the columns
and converts any scalar column name to an arrayref of column name and default_column_type
. It modifies itself and returns nothing. It is called automatically from the constructor.
drop
Execute the DROP TABLE
statement on the database handle.
drop_prefix
Returns the portion of the SQL statement before the table name.
Defaults to DROP TABLE
.
drop_sql
Generates the SQL for the DROP TABLE
statement by concatenating "drop_prefix", "quoted_name", and "drop_suffix".
Alternatively drop_sql
can be set in the constructor if you need something more complex.
drop_suffix
Returns the portion of the SQL statement after the table name.
Nothing by default.
get_raw_row
Subclasses will override this method according to the input data format.
This is called from "get_row" to retrieve the next row of raw data.
It should return undef
when there are no more rows.
get_row
my $row = $loader->get_row();
Returns a single row of data at a time (as an arrayref). This method will be called repeatedly until it returns undef
. The returned arrayref will be flattened and passed to "execute" in DBI.
handle_invalid_row
This is called from "get_row" when a row is determined to be invalid (when "validate_row" throws an error).
If handle_invalid_row
was not specified in the constructor this method is a no-op: the original row will be returned (and eventually passed to "execute" in DBI).
Possible values for the handle_invalid_row
option:
die
- Callsdie()
with the error messagewarn
- Callswarn()
with the error message and returns the row unmodifiedcode ref
If it's a subroutine reference it is called as a method, receiving the loader object, the error message, and the row:
$handler->($loader, $error, $row);
The handler should either
die
to cease processing, return false to skip this row and get the next one, or return a (possibly modified) row that will be passed to "execute" in DBI.This allows you to, for example, write to a log when a bad row is found without aborting your transaction:
handle_invalid_row => sub { my ($self, $error, $row) = @_; $logger->log(['Bad row: %s: %s', $error, $row]); return; # return false to skip this row and move to the next one }
insert_sql
Generate the INSERT
SQL statement that will be passed to "prepare" in DBI.
insert_all
Execute an INSERT
statement on the database handle for each row of data. It will call "prepare" in DBI using "insert_sql" and then call "execute" in DBI once for each row returned by "get_row".
load
my $number_of_rows = $loader->load();
Load data into database table. This is a wrapper that does the most commonly needed things in a single method call. If the transaction
setting is true (default) the actions will be wrapped in a transaction.
Returns the number of rows inserted.
name
Returns the full table name (concatenation of name_prefix
, name
, and name_suffix
).
prepare_data
This method is called from "new" after the object is blessed (obviously). Any preparation work specific to the type of data should be done here.
This is mostly a hook for subclasses and does very little in this module.
quoted_name
Returns the full, quoted table name. Passes catalog
, schema
, and name
attributes to "quote_identifier" in DBI.
quoted_column_names
my $quoted_names = $loader->quoted_column_names();
# ['"column1"', '"column two"']
Returns an arrayref of column names quoted by the database driver.
validate_row
Called from "get_row" to check that the provided row is valid.
It may die
for any error which will be caught in "get_row" and the error will be passed to "handle_invalid_row".
The return value works like that of "handle_invalid_row": On success, the valid row (possibly modified) should be returned. If a false value is returned "get_row" will attempt to get another row.
Currently this only checks that the number of fields in the row matches the number of columns expected, however other checks may be added in the future. Subclasses can overwrite this to define their own validations (though calling the original (superclass method) is recommended).
OPTIONS
This module is very [excessively] configurable. In most cases the default values will be sufficient, but you should be able to customize the object to fit your needs.
Frequently Used Options:
columns
- Arrayref of column definitionsEach element can be an arrayref of column name and data type or just a string for the column name and "default_column_type" will be used. If not passed in the first row of
data
will be assumed to be column names.columns => ['first_name', 'last_name', ['last_seen', 'date']]
dbh
- A database handle as returned byDBI->connect()
This module probably isn't useful without one.
data
- An arrayref of arrayrefs of data to populate the table;Subclasses may define more appropriate options and ignore this parameter. If you're using this base class, you'll probably need this (unless you provide your own
get_row
coderef).data => [ ['polar', 'bear', '2010-08-15'], ['blue', 'duck', '2009-07-30'] ]
Less common options that are available when you desire extra tweaking power:
create
- Boolean; Whether or not to perform theCREATE TABLE
statementDefaults to true.
default_column_type
- Default data type for each columnThis will be used for each column that does not explicitly define a data type. The default will be determined from the database driver using
default_sql_data_type
. See "default_column_type".default_column_type => 'CHAR(50)'
drop
- Boolean to execute aDROP TABLE
statement beforeCREATE TABLE
Defaults to false. Set it to true if the named table already exists and you want to recreate it.
get_row
- A sub (coderef) that will override "get_raw_row"You can use this if your input data is in a different format than the module expects (to split a string into an arrayref, for instance). This is called like a method (the object will be
$_[0]
). The return value will be passed tomap_rows
if both are present.# each record is a line from a log file; # use the m// operator in list context to capture desired fields get_row => sub { my $s = <$io>; [ $s =~ m/^(\d+)\s+"([^"]+)"\s+(\S+)$/ ] }
NOTE
: If you useget_row
and don't passdata
you will probably want to passcolumns
(otherwise columns will be taken from the first call toget_row
).grep_rows
- A sub (coderef) to determine if a row should be used or skippedNamed after the built in
grep
function. It will receive the row as an arrayref in$_[0]
. (The row will also be available in$_
for consistency with the built ingrep
.) The object will be passed as$_[1]
in case you want it. If it returns a true value the row will be used. If it returns false the next row will be fetched and the process will repeat (until all rows have been exhausted).grep_rows => sub { $_->[1] =~ /something/ } # accept the row if it matches grep_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
handle_invalid_row
- How to handle invalid rows.Can be
die
,warn
, or a sub (coderef). See "handle_invalid_row" for more details. Default is to ignore (in which case DBI will likely error).map_rows
- A sub (coderef) to filter/mangle a row before useNamed after the built in
map
function. It will receive the row as an arrayref in$_[0]
. (The row will also be available in$_
for consistency with the built inmap
.) The object will be passed as$_[1]
in case you want it. It should return an arrayref (which will be used as the row).map_rows => sub { [ map { uc $_ } @$_ ] } # uppercase all the fields map_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
name
- Table nameDefaults to
'data'
. Subclasses may provide a more useful default.table_type
- String that will go beforeTABLE
in "create_prefix"A useful value might be
TEMPORARY
orTEMP
. This is probably database driver dependent, so use an appropriate value.transaction
- BooleanAll the operations in "load" will be wrapped in a transaction by default. Set this option to false to disable this.
Options that will seldom be necessary but are available for completeness and/or consistency:
catalog
- Table catalogPassed to "quote_identifier" in DBI to get the full, quoted table name. None by default.
create_prefix
- The opening of the SQL statementSee "create_prefix". Overwrite if you need something more complex.
create_sql
- TheCREATE TABLE
statementSee "create_sql". Overwrite if you need something more complex.
create_suffix
- The closing of the SQL statementSee "create_suffix". Overwrite if you need something more complex.
default_sql_data_type
- Default SQL standard data typeIf
default_column_type
is not supplied it will be determined by asking the database driver for a type corresponding toDBI::SQL_LONGVARCHAR
. Alternate values can be passed (DBI::SQL_VARCHAR()
for instance). See "default_sql_data_type".drop_prefix
- The opening of the SQL statementSee "drop_prefix". Overwrite if you need something more complex.
drop_sql
- TheDROP TABLE
statementWill be constructed if not provided. See "drop_sql".
drop_suffix
- The closing of the SQL statementSee "drop_suffix". Overwrite if you need something more complex.
name_prefix
- String prepended to table nameProbably mostly useful in subclasses where
name
is determined automatically.name_suffix
- String appended to table nameProbably mostly useful in subclasses where
name
is determined automatically.quoted_name
- Full table name, properly quotedOnly necessary if you need something more complicated than
$dbh->quote_identifier($catalog, $schema, $table)
(see "quote_identifier" in DBI).schema
- Table schemaPassed to "quote_identifier" in DBI to get the full, quoted table name. None by default.
SUBCLASSING
This module was designed to be subclassed for use with specific data input formats.
DBIx::TableLoader::CSV is a prime example. It is the entire reason this base module was designed.
Subclasses will likely want to override the following methods:
"defaults" - a hashref of additional acceptable options (and default values)
"default_name" - if you can determine a good default name from the input
"get_raw_row" - to return the next row of data
"prepare_data" - to initialize your object/data (open the file, etc.)
Be sure to check out the code for DBIx::TableLoader::CSV. Also see a very simple example in t/subclass.t.
RATIONALE
It seemed frequent that I would find a data set that was difficult to view/analyze (CSV, log file, etc.) and would prefer to load it into a database for its powerful, familiar processing abilities.
I once chose to use MySQL because its built in LOAD DATA
command read the malformed CSV I was given and the .import
command in SQLite did not.
I wrote this module so that I'd never have to make such a choice again. I wanted to be able to use the power of Text::CSV to make sure I could take any CSV I ever got and load it into SQLite easily.
I tried to make this module a base class to be able to handle various formats.
TODO
This is more of a list of ideas than features that are planned.
Allow a custom column name transformation sub to be passed in
Use "decamelize" in String::CamelCase by default?
Allow extra columns (like
id
) to be added and/or generatedOption to scan the data to guess appropriate data types for each column
Make a SQLite function so that you could call this from a
dbish
command line?Allow
UPDATE
statements and specify the key columns (for theWHERE
clause)
SEE ALSO
SUPPORT
Perldoc
You can find documentation for this module with the perldoc command.
perldoc DBIx::TableLoader
Websites
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
Search CPAN
The default CPAN search engine, useful to view POD in HTML format.
RT: CPAN's Bug Tracker
The RT ( Request Tracker ) website is the default bug/issue tracking system for CPAN.
CPAN Ratings
The CPAN Ratings is a website that allows community ratings and reviews of Perl modules.
CPAN Testers
The CPAN Testers is a network of smokers who run automated tests on uploaded CPAN distributions.
CPAN Testers Matrix
The CPAN Testers Matrix is a website that provides a visual overview of the test results for a distribution on various Perls/platforms.
CPAN Testers Dependencies
The CPAN Testers Dependencies is a website that shows a chart of the test results of all dependencies for a distribution.
Bugs / Feature Requests
Please report any bugs or feature requests by email to bug-dbix-tableloader at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DBIx-TableLoader. You will be automatically notified of any progress on the request by the system.
Source Code
https://github.com/rwstauner/DBIx-TableLoader
git clone https://github.com/rwstauner/DBIx-TableLoader.git
AUTHOR
Randy Stauner <rwstauner@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by Randy Stauner.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.