NAME

DBIx::Admin::BackupRestore - Back-up all tables in a db to XML, and restore them

Synopsis

use DBIx::Admin::BackupRestore;

# Backup.

open(OUT, "> $file_name") || die("Can't open(> $file_name): $!");
print OUT DBIx::Admin::BackupRestore -> new(dbh => $dbh) -> backup('db_name');
close OUT;

# Restore.

DBIx::Admin::BackupRestore -> new(dbh => $dbh) -> restore($file_name);

Description

DBIx::Admin::BackupRestore is a pure Perl module.

It exports all data in all tables from one database to one or more XML files.

Then these files can be imported into another database, possibly under a different database server.

Warning: It is designed on the assumption you have a stand-alone script which creates an appropriate set of empty tables on the destination database server. You run that script, and then run this module in 'restore' mode.

This module is used daily to transfer a MySQL database under MS Windows to a Postgres database under Linux.

Similar modules are discussed below.

Distributions

This module is available both as a Unix-style distro (*.tgz) and an ActiveState-style distro (*.ppd). The latter is shipped in a *.zip file.

See http://savage.net.au/Perl-modules.html for details.

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing each type of distro.

Constructor and initialization

new(...) returns an object of type DBIx::Admin::BackupRestore.

This is the class's contructor.

Usage: DBIx::Admin::BackupRestore -> new().

This method takes a set of parameters. Only the dbh parameter is mandatory.

For each parameter you wish to use, call new as new(param_1 => value_1, ...).

clean

The default value is 0.

If new is called as new(clean => 1), the backup phase deletes any characters outside the range 20 .. 7E (hex).

The restore phase ignores this parameter.

This parameter is optional.

dbh

This is a database handle.

This parameter is mandatory when calling methods backup() and restore*(), but is not required when calling method split(), since the latter is just a file-to-file operation.

fiddle_timestamp

This parameter takes one of these values: 0, 1 or 2.

The default value is 1.

If the value of this parameter is 0, then restore() does not fiddle the value of fields whose names match /timestamp/.

If the value of the parameter is 1, then restore() fiddles the value of fields whose names match /timestamp/ in this manner:

All values are assumed to be of the form /^YYYYMMDD/ (fake reg exps are nice!).
Hours, minutes and seconds, if present, are ignored.
Timestamps undergo either 1 or 2 transformations.
Firstly, if the value matches /^0000/, convert it to 19700101.
Then, all values are converted to YYYY-MM-DD 00:00:00.
Eg: This - 00000000 - is converted to 1970-01-01 00:00:00
and today - 20050415 - is converted to 2005-04-15 00:00:00.
You would use this option when transferring data from MySQL's 'timestamp' type
to Postgres' 'timestamp' type, and MySQL output values match /^(\d{8})/.

If the value of the parameter is 2, then restore() fiddles the value of fields whose names match /timestamp/ in this manner:

Timestamps undergo either 0 or 1 transformations.
If the value matches /^0000/, hours, minutes and seconds, if present, are ignored.
If the value matches /^0000/, convert it to 1970-01-01 00:00:00.
Values not matching that pattern are not converted.
Eg: This - 0000-00-00 00:00:00 - is converted to 1970-01-01 00:00:00
and today - 2005-04-15 09:34:00 - is not converted.
You would use this option when transferring data from MySQL's 'datetime' type
to Postgres' 'datetime' type, and some MySQL output values match /0000-00-00 00:00:00/
and some values are real dates, such as 2005-04-15 09:34:00.

This parameter is optional.

skip_schema

The default value is [].

If new is called as new(skip_schema => ['some_schema_name']), the restore phase does not restore any tables in the named schema.

Here, 'schema' is defined to be the prefix on a table name, and to be separated from the table name by a '.'.

Note: You would normally use these options to port data from Postgres to MySQL: new(skip_schema => ['information_schema', 'pg_catalog'], transform_tablenames => 1).

skip_tables

The default value is [].

If new is called as new(skip_tables => ['some_table_name']), the restore phase does not restore the tables named in the call to new().

This option is designed to work with CGI scripts using the module CGI::Sessions.

Now, the CGI script can run with the current CGI::Session data, and stale CGI::Session data is not restored from the XML file.

This parameter is optional.

transform_tablenames

The default value is 0.

The only other value currently recognized by this option is 1.

Now, new(transform_tablenames => 1) chops the schema, up to and including the first '.', off table names. Thus a table exported from Postgres as 'public.service' can be renamed 'service' when being imported into another database, eg MySQL.

Here, 'schema' is defined to be the prefix on a table name, and to be separated from the table name by a '.'.

Note: You would normally use these options to port data from Postgres to MySQL: new(skip_schema => ['information_schema', 'pg_catalog'], transform_tablenames => 1).

This parameter is optional.

verbose

The default value is 0.

If new is called as new(verbose => 1), the backup and restore phases both print the names of the tables to STDERR.

When beginning to use this module, you are strongly encouraged to use the verbose option as a progress monitor.

This parameter is optional.

Method: backup($database_name)

Returns a potentially-huge string of XML.

You would normally write this straight to disk.

The database name is passed in here to help decorate the XML.

As of version 1.06, the XML tags are in lower case.

Method restore() will read a file containing upper or lower case tags. Method restore_in_order() won't.

Method: restore($file_name)

Returns an array ref of imported table names. They are sorted by name.

Opens and reads the given file, presumably one output by a previous call to backup().

The data read in is used to populate database tables. Use method split() to output to disk files.

Method: restore_in_order($file_name, [array ref of table names])

Returns nothing.

Opens and reads the given file, presumably one output by a previous call to backup().

The data read in is used to populate database tables. Use method split() to output to disk files.

Restores the tables in the order given in the array ref parameter.

This allows you to define a column with a clause such as 'references foreign_table (foreign_column)', and to populate the foreign_table before the dependent table.

And no, mutually-dependent and self-referential tables are still not catered for.

And yes, it does read the file once per table. Luckily, XML::Records is fast.

But if this seems like too much overhead, see method split().

Method split($file_name)

Returns an array ref of imported table names. They are sorted by name.

Opens and reads the given file, presumably one output by a previous call to backup().

Each table not being skipped is output to a separate disk file, with headers and footers the same as output by method backup().

This means each file can be input to methods restore() and restore_in_order().

The tables' schema names and table names are used to construct the file names, together with an extension of '.xml'.

See examples/split-xml.pl and all-tables.xml for a demo.

Lastly, method split() uses lower-case XML tags.

Example code

See the examples/ directory in the distro.

There are 2 demo programs:

backup-db.pl
restore-db.pl

FAQ

Why do I get 'duplicate key' errors after restoring?

Most likely because:

You are using Postgres or equivalent
You created a sequence

Eg: create sequence t_seq.

You created a table with the primary key referring to the sequence

Eg: create table t (t_id integer primary key default nextval('t_seq'), ...).

You populated the table

Let's say with 10 records, so the sequence is now at 10.

And the primary key field now contains the values 1 .. 10.

You exported the table with this module

Note: The export file contains the values 1 .. 10 in the primary key field.

You recreated the sequence

So the sequence is now at 1.

You recreated the table
You imported the data with this module

Note: Since the import file contains the values 1 .. 10 in the primary key field, these values are used to populate the table, and the sequence's nextval() is never called.

So the sequence is still at 1.

You tried to insert a record, which triggered a call to nextval()

But this call returns 1 (or perhaps 2), which is already in the table.

Hence the error about 'duplicate key'.

Related Modules

On CPAN I can see 4 modules which obviously offer similar features - there may be others.

DBIx::Copy
DBIx::Dump
DBIx::Migrate
DBIx::XML_RDB

Of these, DBIx::XML_RDB is the only one I have experimented with. My thanks to Matt Sergeant for that module.

I have effectively extended his module to automatically handle all tables, and to handle importing too.

Required Modules

Install the 3 XML modules in this order.

Carp
File::Spec
XML::Parser
XML::TokeParser
XML::Records

Changes

See Changes.txt.

Author

DBIx::Admin::BackupRestore was written by Ron Savage <ron@savage.net.au> in 2004.

Home page: http://savage.net.au/index.html

Copyright

Australian copyright (c) 2004, Ron Savage. All rights reserved.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html