The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Sync - A simple metadirectory/datapump module

SYNOPSIS

 use Data::Sync;

 my $sync = Data::Sync->new(log=>"STDOUT",[configfile=>"config.dds"]);

 $sync->source($dbhandle,{select=>"select * from testtable"});

 or

 $sync->source($ldaphandle,{filter=>"(cn=*)",
                                scope=>"sub",
                                base=>"ou=testcontainer,dc=test,dc=org"});

 $sync->target($dbhandle,{table=>'targettable',
                                index=>'NAME'});

 or

 $sync->target($ldaphandle);

 $sync->mappings(FIRSTNAME=>'givenName',SURNAME=>'sn');

 $sync->buildattributes(dn=>"cn=%NAME%,ou=testcontainer,dc=test,dc=org",
                        objectclass=>"organizationalUnit");

 $sync->transforms(PHONE=>'s/0(\d{4})/\+44 \($1\) /',
                        ADDRESS=>sub{my $address=shift;
                                        $address=~s/\n/\<BR\>/g;
                                        return $address});

 $sync->save("filename");

 $sync->load("filename");

 $sync->run();

 print $sync->error();

 print $sync->lastruntime();

DESCRIPTION

Data::Sync is a simple metadirectory/data pump module. It automates a number of the common tasks required when writing code to migrate/sync information from one datasource to another.

In order to use Data::Sync, you must define a source and a target. The first parameter to the source & target methods is a bound DBI/Net::LDAP handle.

Having defined your datasources, define how attributes map between them with mappings. If an attribute returned from the data source has no entry in the mapping table, it will be assumed that the name is the same in both databases.

Attributes can be built up from multiple other attributes using buildattributes. This uses a simple template, %FIELDNAME% which is replaced at run time with the value of the field from the current record. More complex modifications to data can be made with transforms, which runs after the built attributes are created.

Transforms can be made with the method transforms, which takes a hash of FIELDNAME=>transformation. This transformation can be one of three things: a regular expression in string form (see synopsis), the name of a predefined transformation supplied in Data::Sync, or a code ref.

Finally, if you are confident your data is all in the right format, use run. That will read the data from the source, modify it as you have specified, and write it to the target.

WARNING! There is no implied or real warranty associated with the use of this software. That's fairly obvious, but worth repeating here. Metadirectory applications have the potential to destroy data in a very big way - they must be constructed carefully, and thoroughly tested before use on a live system.

CONSTRUCTOR

 my $sync = Data::Sync->new(log=>"STDOUT");
 my $sync = Data::Sync->new(log=>$fh);
 my $sync = Data::Sync->new(configfile=>"config.dds");

The constructor returns a Data::Sync object. To use logging, pass the string STDOUT as the log parameter to print logging to STDOUT, or a lexical filehandle. Optionally, you can specify a config file, although you'll still need pass the db/ldap handles (only) to source & target.

METHODS

source

 $sync->source($dbhandle,{select=>"select * from testtable"});

 or

 $sync->source($ldaphandle,{filter=>"(cn=*)",
                                scope=>"sub",
                                base=>"ou=testcontainer,dc=test,dc=org"});

 or

 $sync->source($dbhandle); # only if loading config file

Requires a valid, bound (i.e. logged in) Net::LDAP or DBI handle, and a hash of parameters for the data source (assuming you aren't loading the config from a file). LDAP parameters are: filter scope base attrs controls

(See Net::LDAP for more details of these parameters).

DBI parameters are: select

By default, the source method will define the read operation as 'all in one'. If you want to handle data in batches, specify

 batchsize=>x

in the hash of read criteria. This will read a batch from the handle, perform the operation, read the next batch from the handle, and so on. Note that this will still be working against an entire record set matching your criteria, so the memory advantages are limited.

target

 $sync->target($dbhandle,{table=>'targettable',
                                index=>'NAME'});

 or

 $sync->target($ldaphandle);

 or

 $sync->target($db); # only if loading config from a file

Requires a valid, bound (i.e. logged in) DBI or Net::LDAP handle, and a hash of parameters (unless you are loading the config from a file). No parameters are required for LDAP data targets, but a dn must have been either read from the data source or constructed using buildattributes. Valid DBI parameters are

 table - the table you wish to write to on the data target
 index - the attribute you wish to use as an index

There is no 'pre check' on datatypes or lengths, so if you attempt to write a record with an oversized or mismatched data type, it will fail with an error.

Note: if you are writing from DB to LDAP, you must construct all mandatory attributes using buildattributes, or additions will fail.

mappings

 $sync->mappings(FIRSTNAME=>'givenName',SURNAME=>'sn');

Maps individual field names from the data source, to their corresponding field names in the data target.

buildattributes

 $sync->buildattributes(dn=>"cn=%NAME%,ou=testcontainer,dc=test,dc=org",
                        objectclass=>"organizationalUnit");

Builds new target attributes up from existing source attributes. A simple template form is used - the template should be a string variable, containing the source field name between % delimiters. If no % delimiters are found, the string will be written as a literal.

transforms

 $sync->transforms(     PHONE=>'s/0(\\d{4})/\+44 \(\$1\)'/’,
                        OFFICE=>"stripspaces",
                        ADDRESS=>sub{my $address=shift;
                        $address=~s/\n/\<BR\>/g;
                        return $address});

Converts each field in the source data using the parameters passed. Each parameter pair is the target field name, along with a regex (in a string), a coderef, or a standard function. The following list of transformation functions are supplied in this version:

 stripspaces
 stripnewline
 uppercase
 lowercase

Note: If passing a regex in a string, make sure you use single quotes. Double quotes will invite perl to interpolate the contents, with unexpected results.

save

 $sync->save("filename");

Saves the config to a Data::Dump::Streamer file. Returns 1 on success.

load

 $sync->load("filename");

Loads the config from a Data::Dump::Streamer file previously created with save. You still need to define the source and target db/ldap handles with source & target, but if you've loaded the config from a file you can omit the hash of options.

run

 $sync->run() or die $sync->error."\n";

No parameters. Reads the data from the source, converts and renames it as defined in mappings, buildattributes and transforms, and writes it to the target.

error

 print $sync->error;

Returns the last error encountered by the module. This is set e.g. when a file fails to load correctly, when a sql error is encountered etc. When this occurs, the return value from the called function will be zero, and error() should be called to identify the problem.

lastruntime

 print $sync->lastruntime;

Returns the last time the job was run as YYYYMMDDHHMMSS. This is saved in the config file.

PREREQS

Data::Dump::Streamer

If you are using LDAP datasources, you will need Net::LDAP.

If you are using DBI datasources, you will need DBI and the appropriate DBD drivers.

VERSION

0.03

TODO

Transform to convert multivalued attribs to single valued (choice: use value[index], join with delimiter)

Example using AnyData & XML

Deletion support (somehow, anyhow....)

Delta support/timestamp detection/changelog & persistent search

Multiple sources in a single job?

Multiple targets in a single job?

Caching?

UTF8/ANSI handling.

Perltidy the tests (thanks for spotting the mess Gavin)

Use SQL::Abstract instead of constructing statements?

CHANGES

v0.03

Added uppercase and lowercase transformations

Moved read and write subs out of anonymous blocks

hid raw regex in #!#<regex>#!# inside coderef for regex transformations (can be parsed out for display/edit in gui)

implemented batch updating

V0.02

Implemented load & save functions.

Implemented error function

Modified stripnewlines to replace with whitespace.

COPYRIGHT

Copyright (c) 2004-2005 Charles Colbourn. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Charles Colbourn

charlesc@g0n.net

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1056:

Non-ASCII character seen before =encoding in '\(\$1\)'/’,'. Assuming UTF-8