NAME
Sybase::BLK - Simple front end to the Sybase Bulk Libraries (from ctlib/comlib)
SYNOPSIS
use Sybase::BLK;
$bcp = new Sybase::BLK ...;
$bcp->config(...);
$bcp->run;
It's very similar to Sybase::BCP, except that it's based on Sybase::CTlib instead of Sybase::DBlib.
DESCRIPTION
The Sybase::BLK module serves as a simplified front end for Sybase's Bulk Copy library. It is sub-classed from the Sybase::CTlib module, so all the features of the Sybase::CTlib module are available in addition to the specific Sybase::BLK methods.
So how does it work?
Let's say we want to copy the contents of a file name 'foo.bcp' into the table 'mydb.dbo.bar'. The fields in the file are separated by a '|'.
#!/usr/local/bin/perl
use Sybase::BLK;
$bcp = new Sybase::BLK $user, $passwd;
$bcp->config(INPUT => 'foo.bcp',
OUTPUT => 'mydb.dbo.bar',
SEPARATOR => '|');
$bcp->run;
That's it!
Of course, there are several things you can do to cater for non-standard input files (see Configuration Parameters, below).
Features
Allows use of Regular Expressions as separator
Automatic conversions from non-standard date formats.
Handles column reordering and/or skipping of unneeded data.
Row or column based callbacks.
Allows vetoing of rows, or arbitrary processing of data on input.
The following methods are available:
- $bcp=new Sybase::BLK [$user [, $pass [, $server [, $appname [, $attr]]]]]
-
Allocate a new BLK handle. Opens a new connection to Sybase via the Sybase::CTlib module, and enables BLK IN on this handle. The $attr variable is a hash ref that gets passed to Sybase::CTlib, and can be used to set connection properties (see the new/ct_connect entry in the Sybase::CTlib man page).
- $bcp->config([parameters])
-
Sets up the Bulk Copy operation. See Configuration Parameters below for details.
- $bcp->describe($colid, {parameters})
-
Adds a specific configuration element for column $colid. Columns are numbered starting at 1, as is standard in the Sybase APIs.
- $bcp->run
-
Perform the BLK operation, returns the actual number of rows sent to the server.
Configuration Parameters
The general form for configuration is to pass (parameter => value) pairs via the config() or describe() methods. Some parameters take slightly more complex arguments (see REORDER).
Paramaters for config()
- DIRECTION
-
The direction in which the bulkcopy operation is done. Can be 'IN' or 'OUT'. Default: 'IN' (Note: 'OUT' is not implemented yet.)
- INPUT
-
Where BLK should take it's input from. It's a filename for bcp IN, it's a table name for bcp OUT.
For bcp IN INPUT can also be a reference to a perl subroutine that returns the array to be inserted via blk_rowxfer().
- OUTPUT
-
Where BLK should place it's output. It's a table name for bcp IN, a filename for bcp OUT.
- ERRORS
-
The file where invalid rows should be recorded. Default: bcp.err.
- SEPARATOR
-
The pattern that separates fields in the input file, or that should be used to separate fields in the output file. Since this pattern is passed to split, it can be a regular expression. By default regular expression meta-characters are not interpreted as such, unless the RE_USE_META attribute is set. Default: TAB.
- RE_USE_META
-
If this attribute is set then the regular expression used to split rows into columns (defined by SEPARATOR) will interpret regular expression meta-characters normally (i.e. a '|' means alternation - see perldoc perlre for details on regular expression meta-characters). Default: false.
- RECORD_SEPARATOR
-
The pattern that separates records (rows) in the input file. Sybase:BLK will set a local copy of $/ to this value before reading the file. Default: NEWLINE.
- BATCH_SIZE
-
Number of rows to be batched together before committing to the server for bcp IN operations. Defaults to 100. If there is a risk that retries could be requiered due to failed batches (e.g. duplicate rows/keys errors) then you should not use a large batch size: one failed row in a batch requires the entire batch to be resent.
- RETRY_FAILED_BATCHES
-
If this attribute is set then a failed batch will be retried one row at a time so that all the rows that don't fail get loaded. Default: false.
- NULL
-
A pattern to be used to detect NULL values in the input file. Defaults to a zero length string.
- HAS_IDENTITY
-
Boolean determining whether the values in the input file should be used to populate any IDENTITY column in the target table. Leave false if the target table doesn't have any identity columns, or if you want to let the server populate the IDENTITY column (and see the IDENTITY_COL attribute, below).
- IDENTITY_COL
-
If your target table has an identity column, and you want to let the server populate it, then set IDENTITY_COL to the column number of the identity column in the table (starting with 1 for the first column).
- DATE
-
The default format for DATE fields in the input file. The parameter should be a symbolic value representing the format. Currently, the following values are recognized: CTIME (the Unix ctime(3) format), or the numbers 0-12, 100-112, corresponding to the conversion formats defined in table 2-4 of the SQL Server Reference Manual.
BLK detects datetime targets by looking up the target table structure in the Sybase system tables.
- REORDER
-
The ordering of the fields in the input file does not correspond to the order of columns in the table, or there are columns that you wish to skip. The REORDER parameter takes a hash that describes the reordering operation:
$bcp->config(... REORDER => { 1 => 2, 3 => 1, 2 => 'foobar', 12 => 4}, ...);
In this example, field 1 of the input file goes in column 2 of the table, field 3 goes in column 1, field 2 goes in the column named foobar, and field 12 goes in column 4. Fields 4-11, and anything beyond 12 is skipped. As you can see you can use the column name instead of its position. The default is to not do any reordering.
- CALLBACK
-
The callback subroutine is called for each row (after any reordering), and allows the user to do global processing on the row, or vetoing it's processing. Example:
$bcp->config(... CALLBACK => \&row_cb, ...); sub row_cb { my $row_ref = shift; # Skip rows where the first field starts with FOO: return undef if $$row_ref[0] =~ /^FOO/; 1; }
- CONDITION
-
A where clause to be used in bcp OUT operations. Not implemented.
Parameters for describe()
- CALLBACK
-
Specify a callback for this column. The field value is passed as the first parameter, and the callback should return the value that it wants BLK to use. Example:
$dbh->describe(2, {CALLBACK, \&col_cb}); sub col_cb { my $data = shift; # Convert to lower case... $data =~ tr/A-Z/a-z/; }
- SKIP
-
If this is defined then this field is skipped. This is useful if only one or two fields need to be skipped and you don't want to define a big REORDER hash to handle the skipping.
EXAMPLES
#!/usr/local/bin/perl
use Sybase::BLK;
$bcp = new Sybase::BLK sa, undef, TROLL;
$bcp->config(INPUT => '../../Sybperl/xab',
OUTPUT => 'excalibur.dbo.t3',
BATCH_SIZE => 200,
REORDER => {1 => 'account',
3 => 'date',
2 => 'seq_no',
11 => 'broker'},
SEPARATOR => '\|');
$bcp->run;
BUGS
Bulk copy out is not implemented.
This module was copied from Sybase::BCP and so is subject to many of the same issues noted in that module.
The current implementation seems to run about 2.5 to 3 times slower than plain bcp.
AUTHOR
Scott Zetlan <scottzetlan@aol.com> after the original Sybase::BCP by Michael Peppler <mpeppler@peppler.org>. Contact the sybperl mailing list mailto:sybperl-l@peppler.org
if you have any questions.