NAME

Data::Sync - A simple metadirectory/datapump module

SYNOPSIS

 use Data::Sync;

 my $sync = Data::Sync->new(log=>"STDOUT",[configfile=>"config.dds"],[jobname=>"MyJob"]);

 $sync->source($dbhandle,{
				select=>"select * from testtable",
				index=>"NAME",
			});

 or

 $sync->source($ldaphandle,{filter=>"(cn=*)",
				scope=>"sub",
				base=>"ou=testcontainer,dc=test,dc=org"});

 $sync->target($dbhandle,{table=>'targettable',
				index=>'NAME'});

 or

 $sync->target($ldaphandle);

 $sync->mappings(FIRSTNAME=>'givenName',SURNAME=>'sn');

 $sync->buildattributes(dn=>"cn=%NAME%,ou=testcontainer,dc=test,dc=org",
			objectclass=>"organizationalUnit");

 $sync->transforms(	PHONE=>'s/0(\d{4})/\+44 \($1\) /',
			ADDRESS=>sub{my $address=shift;
					$address=~s/\n/\<BR\>/g;
					return $address});

 $sync->validation(	address=>"/street/i",
			name=>"/(Dr|Mr|Mrs|Ms|Miss)/" );

 $sync->save("filename");

 $sync->load("filename");

 $sync->run();

 print $sync->error();

 print $sync->lastruntime();

 print $sync->commit();

(For more complex uses, see Data::Sync::Advanced)

DESCRIPTION

NB: 0.07 is an interim release - the next version will contain significant changes, including changes to the UI.

Data::Sync is a simple metadirectory/data pump module. It automates a number of the common tasks required when writing code to migrate/sync information from one datasource to another.

In order to use Data::Sync, you must define a source and a target. The first parameter to the source & target methods is a bound DBI/Net::LDAP handle.

Having defined your datasources, define how attributes map between them with mappings. If an attribute returned from the data source has no entry in the mapping table, it will be assumed that the name is the same in both databases.

Attributes can be built up from multiple other attributes using buildattributes. This uses a simple template, %FIELDNAME% which is replaced at run time with the value of the field from the current record. More complex modifications to data can be made with transforms, which runs after the built attributes are created.

Transforms can be made with the method transforms, which takes a hash of FIELDNAME=>transformation. This transformation can be one of three things: a regular expression in string form (see synopsis), the name of a predefined transformation supplied in Data::Sync, or a code ref.

Finally, if you are confident your data is all in the right format, use run. That will read the data from the source, modify it as you have specified, validate it against the pattern matches you've specified (if any) and write it to the target.

WARNING! There is no implied or real warranty associated with the use of this software. That's fairly obvious, but worth repeating here. Metadirectory applications have the potential to destroy data in a very big way - they must be constructed carefully, and thoroughly tested before use on a live system.

CONSTRUCTOR

my $sync = Data::Sync->new(log=>"STDOUT");
my $sync = Data::Sync->new(log=>$fh);
my $sync = Data::Sync->new(configfile=>"config.dds");
my $sync = Data::Sync->new(jobname=>"MyJob");

The constructor returns a Data::Sync object. Optionally, to use logging, pass the string STDOUT as the log parameter to print logging to STDOUT, or a lexical filehandle. You can specify a config file to get the configuration from, in which case you don't need to call mappings/transforms etc, although you'll still need pass the db/ldap handles (only) to source & target.

If you are using attribute hashing to minimise unnecessary writes, you should specify a jobname, as this is the name given to the SQLite hash database.

METHODS

source

 $sync->source($dbhandle,{select=>"select * from testtable"});

 or

 $sync->source($ldaphandle,{filter=>"(cn=*)",
				scope=>"sub",
				base=>"ou=testcontainer,dc=test,dc=org"});

 or

 $sync->source($dbhandle); # only if loading config file

Requires a valid, bound (i.e. logged in) Net::LDAP or DBI handle, and a hash of parameters for the data source (assuming you aren't loading the config from a file). LDAP parameters are: filter scope base attrs controls

(See Net::LDAP for more details of these parameters).

DBI parameters are: select

target

 $sync->target($dbhandle,{table=>'targettable',
				index=>'NAME'});

 or

 $sync->target($ldaphandle);

 or

 $sync->target($db); # only if loading config from a file
 
 or
 
 $sync->target($dbhandle,{table=>'targettable',
				index=>'NAME'
  				hashattributes=>["ADDRESS","PHONE"]
					});
					
 or
 
 $sync->target($ldaphandle,{hashattributes=>["ADDRESS","PHONE"]}
				);

 

Requires a valid, bound (i.e. logged in) DBI or Net::LDAP handle, and a hash of parameters (unless you are loading the config from a file). No parameters are required for LDAP data targets, but a dn must have been either read from the data source or constructed using buildattributes. Valid DBI parameters are

table - the table you wish to write to on the data target
index - the attribute you wish to use as an index

There is no 'pre check' on datatypes or lengths, so if you attempt to write a record with an oversized or mismatched data type, it will fail with an error.

Note: if you are writing from DB to LDAP, you must construct all mandatory attributes using buildattributes, or additions will fail.

Attribute hashing can be specified with the keys:

index=>"index/key attribute"
hashattributes=>["attrib","attrib","attrib"]

When running, this will create an MD5 hash of the concatentation of the specified attributes, and store it in a database under the specified index. Next time the job is run, it will hash the value again, and compare it with the last hashed value. If they are the same, the record will not be written to the target. These entries are stored in a SQLite database - if you want to manipulate the database directly, you can do so with a sqlite3 client. The SQLite database takes it's name from the 'jobname' attribute specified in $sync->new. If you didn't specify a jobname, it will default to 'noname' - so if you are running multiple jobs with attribute hashing in the same directory on your disk, it's important to make sure they have names.

mappings

$sync->mappings(FIRSTNAME=>'givenName',SURNAME=>'sn');

Maps individual field names from the data source, to their corresponding field names in the data target.

buildattributes

 $sync->buildattributes(dn=>"cn=%NAME%,ou=testcontainer,dc=test,dc=org",
			objectclass=>"organizationalUnit");

Builds new target attributes up from existing source attributes. A simple template form is used - the template should be a string variable, containing the source field name between % delimiters. If no % delimiters are found, the string will be written as a literal.

transforms

 $sync->transforms(	PHONE=>'s/0(\\d{4})/\+44 \(\$1\)/',
			OFFICE=>"stripspaces",
			ADDRESS=>sub{my $address=shift;
			$address=~s/\n/\<BR\>/g;
			return $address});

Converts each field in the source data using the parameters passed. Each parameter pair is the target field name, along with a regex (in a string), a coderef, or a standard function. The following list of transformation functions are supplied in this version:

stripspaces
stripnewline
uppercase
lowercase
concatenate

concatenate joins together the values of a multi valued attribute with the content of $sync->{mvseparator} - this defaults to | but can be changed with:

$sync->mvseparator("<separator>");

Transformations are recursive, so if you are importing some form of hierarchical data, the transformation will walk the tree until it finds a scalar (or a list, in the case of concatenate) that it can perform the transformation on.

Note: If passing a regex in a string, make sure you use single quotes. Double quotes will invite perl to interpolate the contents, with unexpected results.

validation

 $sync->validation(	address=>"/street/i",
			name=>"/(Dr|Mr|Mrs|Ms|Miss)/" );

Validation defines pattern matches for attributes. The validation methods are the last to be called before writing. If any of the specified fields fail to match the specified pattern match, the whole validation will fail and the write will not happen. Validation is optional, you don't have to specify a validation set, but it's useful to ensure that the constructed record set is what you were expecting before you write it out. Validation is also recursive, so it will handle multi valued attributes and subtrees in LDAP.

deletes

$sync->deletes("delete");

or

$sync->delete(attrib=>value,attrib=>value);

Define the action to be taken against entries that have been deleted. Note that this only works if hashing is defined in 'target'. The parameters can be "delete", in which case the entry is deleted, or attrib=>value pairs, in which case the specified attributes are changed to the string in 'value'. Note that it is impossible to do a 'transform' on the attribute value, as the data is no longer present once a delete has been detected.

save

$sync->save("filename");

Saves the config to a Data::Dump::Streamer file. Returns 1 on success.

load

$sync->load("filename");

Loads the config from a Data::Dump::Streamer file previously created with save. You still need to define the source and target db/ldap handles with source & target, but if you've loaded the config from a file you can omit the hash of options.

run

$sync->run() or die $sync->error."\n";

No parameters. Reads the data from the source, converts and renames it as defined in mappings, buildattributes and transforms, and writes it to the target.

error

print $sync->error;

Returns the last error encountered by the module. This is set e.g. when a file fails to load correctly, when a sql error is encountered etc. When this occurs, the return value from the called function will be zero, and error() should be called to identify the problem.

lastruntime

print $sync->lastruntime;

Returns the last time the job was run as YYYYMMDDHHMMSS. This is saved in the config file.

mvseparator

$sync->mvseparator("<separator>");

print $sync->mvseparator();

Sets or returns the multi valued attribute separator. (defaults to |)

commit

$sync->commit();

Calls the write handle commit method, where the write handle is DBI (there's no rollback/commit available in LDAP). This is provided as a convenience, just in case you have autocommit turned off on your db handle.

SEE ALSO

More detail on extra methods, accessing the internal methods of Data::Sync from your code, and subclassing are detailed in Data::Sync::Advanced

PREREQS

If you want to save and load definition files, you'll need Data::Dump::Streamer

If you are using DBI datasources, you will need DBI & the appropriate DBI drivers.

If you are using LDAP datasources, you will need Net::LDAP.

If you are using attribute hashing, you will also need DBI & DBD::SQLite

VERSION

0.07

BUGS & CAVEATS

Data::Sync handles column/attribute names as case sensitive. Postgresql (at least at time of writing) appears to return column names as lc, whether they're created lc or not. I make no guarantees about this, but using lower case column names in all Data::Sync code seems to work OK. Please ensure that any code you write using this module with postgresql is particularly thoroughly tested.

TODO

Hashing individual attributes so if one attribute changes, only that attribute is written.

UI changes: instead of passing a handle, pass a type=>LDAP/DBI/Psql etc, and the connect parameters. Restructure in subclasses so that DBI is a generic sql, but can be subclassed by specific classes for particular datasources. This is a major change to the UI. write a handle factory to create the db/ldap handle. The AoH for records, and methods for read and write are required. This sits well with the dispatcher class. Retain handle passing UI facility if possible, but deprecate.

Implementing Data::Sync::Datasource as a dispatcher class redirecting to ::dbi & ::ldap would fit the UML model better, might make subclassing easier, and would probably be easier to maintain. For other divisions see Classes.dia

Validation can be extended for deletes by putting a % comparison of recordset vs delete set in 'run'

Multiple sources in a single job? Possibly by naming, and with a default name to support existing syntax.

Separate out the table creation methods into a subclass(?) and use a dispatcher? (table creation WIP)

Friendly CSV/TD source/target methods

Modular datasource/targets for including non dbi/ldap datasources?

Example using AnyData & XML

Multiple targets in a single job?

Caching?

UTF8/ANSI handling.

Perltidy the tests (thanks for spotting the mess Gavin)

CHANGES

v0.08

Bugfix, added package statement to Advanced.pm to ensure correct CPAN indexing

v0.07

Added in 'add if no index' facility on writedbi

Generalised delta code

Removed 'use DDS' and replaced with 'require' in save and load methods, so dependency is only if using those functions

Implemented ::deletes

Generalised sourceToAoH & write methods

Separated complex uses into Data::Sync::Advanced POD

Implemented get & put methods

Implemented hashrecord method

extended testing

v0.06

Implemented a commit method for the writehandle.

Implemented validate function

v0.05

Fixed some 0E0 problems with return values from DBI.

Added postgresql caveat.

Extended developer test suite to include MySQL & Postgresql

v0.04

Implemented basic attribute hashing

Added concatenate function for multivalued ldap attributes

v0.03

Added uppercase and lowercase transformations

Moved read and write subs out of anonymous blocks

hid raw regex in #!#<regex>#!# inside coderef for regex transformations (can be parsed out for display/edit in gui)

implemented batch updating

V0.02

Implemented load & save functions.

Implemented error function

Modified stripnewlines to replace with whitespace.

COPYRIGHT

Copyright (c) 2004-2005 Charles Colbourn. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Charles Colbourn

charlesc@g0n.net