Usage of DBIx::PgLink

ARCHITECTURE

 -------------              .- - - -  - -.
  |        |                | Views, etc |<- - - - .
  |        |                `- - - -  - -'         .
  |    PostgreSQL                 |                .
  |     database        .---------------------.    .
  |        |            |  PL/Perl functions  |<- -.
  |        |            `---------------------'    .
  |       ---                /             \       .
  |        |         .--------------.  .---------------------.
  |        |         | DBIx::PgLink |  | DBIx::PgLink::Local |
  |        |         `--------------'  `---------------------'
Local      |                |                      .
server     |         .--------------.              .
  |        |         |   Connector  |              .
  |        |         `--------------'              .
  |      Perl          /          \                .
  |     modules  .-----------. .----------.        .
  |        |     |  Adapter  | | Accessor |. . . . .
  |        |     `-----------' `----------'
  |        |           |
  |        |     .===========.
  |        |     | DBI + DBD |
  |        |     `==========='
  |       ---          |
  |        |     .- - - - - - - - - - - - - - -.
  |        |     | Driver  (native, ODBC, etc) |
 -------------   `- - - - - - - - - - - - - - -'
  Network              |
 -------------   .-------------------.
 Remote server   |  Remote database  |
                 `-------------------'
Views, PL/Perl functions

Provides SQL interface for remote data source.

Collection of named Connectors used in session.

"" in DBIx::PgLink::Adapter

Provides interface for DBI, handles driver incompatibilities and adds extra functions.

"" in DBIx::PgLink::Accessor

Store and retrieve metadata for remote objects in PostgreSQL database, builds persistent accessor functions/views/triggers.

"" in DBIx::PgLink::Connector

Initialize connection, join adapter and accessors together. Interface for PL/Perl functions.

"" in DBIx::PgLink::Local

Emulation of DBI for local PostgreSQL database.

Metadata and helper functions stored in PostgreSQL dbix_pglink schema.

CONNECTION SETUP

Main connection record

  • Ensure that DBD driver works properly.

    Simple test perl script provided

    ./util/dbd_check.pl --dsn="dbi:..." --user=... --password=...

    It is OK to fail some catalog information tests, if exists adapter for your database that cover that issues. If not, you cannot use persistent accessor, but (probably) can use ad-hoc queries.

  • Login to PostgreSQL database as PostgreSQL superuser

    Run psql or pgAdmin or another query tool of your choice.

  • Create main connection record:

    SELECT dbix_pglink.create_connection(conn_name, data_source, adapter_class, logon_mode, use_libs, comment);

    or

    SELECT dbix_pglink.create_connection(conn_name, data_source, comment)

    where

    conn_name (TEXT, mandatory)

    Connection name

    data_source (TEXT, mandatory)

    DBI connection string ('dbi:Driver:params...')

    adapter_class (TEXT)

    Perl module name for used database, guessed if not specified.

    logon_mode (TEXT, mandatory)

    Tells what credentials are used when no mapping exists between local PostgreSQL login and remote user

    'empty'

    Connect with empty user name and empty password

    'current'

    Connect as session_user without password

    'default'

    Connect as default user with default password (see below).

    This is default value.

    'deny'

    Connection refused

    use_libs (TEXT[])

    List of directories, which will be appended to Perl library path (@INC global variable). Useful when part of code resides out of default Perl locations (or PERL5LIB environment variable) and restart of main PostgreSQL process is undesirable.

    Main Connector class loaded *before* using this libs. Setting this attribute can affects only class or role loaded at runtime.

  • To modify connection update of dbix_pglink.connections table directly.

  • To delete connection:

    SELECT dbix_pglink.delete_connection(conn_name)

    All metadata and database objects will be deleted as well.

Example:

SELECT dbix_pglink.create_connection(
  'NORTHWIND',
  'dbi:ODBC:Northwind',
  'Sample database'
);

or

SELECT dbix_pglink.create_connection(
  'NORTHWIND',
  'dbi:ODBC:Northwind',
  'DBIx::PgLink::Adapter::MSSQL',
  'empty',           --use integrated security
  NULL,
  'Sample database'  --comment
);

Mapping between local and remote user

SELECT dbix_pglink.set_user(conn_name, local_user, remote_user, remote_password);

SELECT dbix_pglink.delete_user(conn_name, local_user);

This is optional when

- no authentication required (desktop database like DBF, SQLite)

- OS-level authentication of 'postgres' account on remote server
  (MSSQL with integrated security mode, PostgreSQL with sspi)

There can be special <default> user entry for each connection, where local_user = '' (empty string).

WARNING: Password stored as plain text

Roles

Roles requires disconnect from remote database to apply.

SELECT dbix_pglink.set_conn_role(conn_name, local_user, role_kind, role_name, position, replace);

SELECT dbix_pglink.delete_conn_role(conn_name, local_user, role_kind, role_name);

where

role_kind (TEXT)

To which object role will be applied, 'Adapter' or 'Connector'

local_user (TEXT)

PostgreSQL user name or '' (empty string) for global.

role_name (TEXT)

Perl class name.

Can be full name like 'DBIx::PgLink::Adapter::Roles::InitSession' or just 'InitSession'.

See directory lib/DBIx/PgLink/Roles.

position (INTEGER)

For some roles loading order can matter. Use NULL to append role to end of list.

replace (BOOLEAN)

If true replace role in specified position. If false insert role in position and shift list down.

Attributes

There is DBI database handle attibutes (with name in CamelCase) and Adapter object attributes (with name in lower case). Some roles may add extra Adapter attributes.

SELECT dbix_pglink.set_conn_attr(conn_name, local_user, name, value);

SELECT dbix_pglink.delete_conn_attr(conn_name, local_user, name);

See description of DBI attributes in "ATTRIBUTES COMMON TO ALL HANDLES" in DBI:

Environment variables

Environment variables will be set for backend process before connection start. Use it when environment of different connections conflicts with each others or PostgreSQL restart is not desirable.

SELECT dbix_pglink.set_conn_env(conn_name, local_user, name, value);

SELECT dbix_pglink.delete_conn_env(conn_name, local_user, name);

Requires installation of Environment role.

SELECT dbix_pglink.set_role(conn_name, '', 'Connector', 'Environment');

AD-HOC QUERIES

Note: first time in every session the PL/Perl function that initiates connection take a lot of CPU for several seconds.

This is a price for using great object system for Perl 5, Moose.

After initialization all functions works pretty fast, although not so fast as Perl code that use plain DBI.

Set-returning queries

Function family for set-returning ad-hoc queries:

  • dbix_pglink.query(conn_name, query_text)

  • dbix_pglink.query(conn_name, query_text, param_values)

  • dbix_pglink.query(conn_name, query_text, param_values, param_types)

where

query_text (TEXT)

Any text passed to remote database. Can contains positional parameter placeholders (?).

Using of dollar-quoting for literal queries is preferable.

param_values (array of TEXT)

List of values for parameterized query. All values must be converted to text, database driver do reverse conversion.

param_types (array of TEXT)

List of parameter type names. Use it when driver require explicit parameter type or value need conversion. Currently supported only SQL standard types.

Example:

SELECT *
FROM dbix_pglink.query(
  'NORTHWIND',
  $$SELECT OrderID, OrderDate FROM Orders$$
) as s("OrderID" int, "OrderDate" timestamp); -- <-- column definition

SELECT *
FROM dbix_pglink.query(
  'NORTHWIND',
  $$SELECT OrderID, OrderDate FROM Orders
   WHERE CustomerID = ? AND OrderDate >= ?$$,
  ARRAY['VINET', '1997-01-01'],
  ARRAY['VARCHAR', 'DATE']
) as s("OrderID" int, "OrderDate" timestamp);

Important note: column definition list is required

You can write custom function that wraps query() with proper column definition list, or use persistent accessors.

Data modification and DDL queries

Function family for non-returning data ad-hoc queries:

  • dbix_pglink.exec(conn_name, query_text)

  • dbix_pglink.exec(conn_name, query_text, param_values)

  • dbix_pglink.exec(conn_name, query_text, param_values, param_types)

Example:

SELECT dbix_pglink.exec(
  'NORTHWIND',
  $$UPDATE Orders SET OrderDate='2000-01-01' WHERE OrderID=42$$
);

SELECT dbix_pglink.exec(
  'NORTHWIND',
  $$INSERT INTO Orders (CustomerID,EmployeeID,OrderDate) VALUES (?,?,?)$$,
  ARRAY['foo','5','2001-01-01']
);

SELECT dbix_pglink.exec(
  'NORTHWIND',
  $$UPDATE Orders SET OrderDate=? WHERE OrderID=?$$,
  ARRAY['2000-01-01', '42'],
  ARRAY['DATE', 'INTEGER']
);

SELECT dbix_pglink.exec(
  'NORTHWIND',
  $$CREATE VIEW MyOrders AS SELECT * FROM Orders WHERE EmployeeID=5$$
);

TRANSACTIONS

  • dbix_pglink.begin(conn_name)

  • dbix_pglink.rollback(conn_name)

  • dbix_pglink.commit(conn_name)

DISCONNECTION

  • dbix_pglink.disconnect(conn_name)

PERSISTENT ACCESSORS

Accessor is mapping of remote database object to local object.

The benefits of accessors:

  • No need to describe output result-set for every query

  • Provides safe access to remote database

    By default, users have no priveleges to any accessor. You can grant/revoke permission as for usual PostgreSQL object.

Building accessors

SELECT dbix_pglink.build_accessors( conn_name, local_schema, remote_catalog, remote_schema, remote_object, remote_object_types, object_name_mapping );

or

SELECT dbix_pglink.build_accessors(conn_name, local_schema, remote_schema);

where

local_schema (TEXT)

Name of local PostgreSQL schema where accessors will be created.

Schema created automatically if not exists.

remote_catalog, remote_schema, remote_object (TEXT)

Used for enumeration of remote objects. Like pattern allow '%' to list all objects.

(remote_catalog is database name for Sybase/MSSQL)

remote_object_types (array of TEXT)

Mandatory. Specify object types like 'TABLE', 'VIEW', 'FUNCTION', 'PROCEDURE'

object_name_mapping (array of TEXT)

Can be NULL. Each array element must have format 'remote_name=>local_name'.

Creates function-based view for remote table/view and function for remote routine (stored procedure/function).

Old accessors will be dropped unless exists dependent user objects.

If you want to rebuild accessor you must drop dependent object manually or change its definition.

To relax dependency chain in PL/PgSQL use dynamic EXECUTE.

Example:

SELECT dbix_pglink.build_accessors(
  'NORTHWIND', --connection name
  'northwind', --new local schema
  'dbo'        --remote schema (owner)
);

or

SELECT dbix_pglink.build_accessors(
  'NORTHWIND', --connection name
  'northwind', --new local schema
  'northwind', --remote catalog (database)
  'dbo',       --remote schema (owner)
  '%',         --all remote objects
  ARRAY['TABLE','VIEW','PROCEDURE'], --object types
  ARRAY[
    'Alphabetical list of products', 'products_with_category',
    'Current Product List', 'products'
  ]  --name mapping
);

Stored procedures

Notes for Microsoft SQL Server / Sybase ASE users:

  • In general, stored procedure have no fixed resultset and therefore cannot be used as PostgreSQL function. PgLink adapter tries to detect resultset of procedure, calling it with NULL parameters in FMTONLY mode. It works only if procedure output same resultset structure for every input and does not exit before returning resultset (when checks input parameters).

    You can create accessor for procedure with explicit resultset:

    dbix_pglink.build_procedure(conn_name, local_schema, remote_catalog, remote_schema, remote_procedure, local_name, column_info)

    where column_info is array of text containing pairs of 'column_name', 'remote_type'.

    --function installed with Connector role 'SQLServerProc'
    SELECT dbix_pglink.set_role('NORTHWIND','Connector','SQLServerProc');
    --reconnect needed
    SELECT dbix_pglink.disconnect('NORTHWIND');
    
    
    SELECT dbix_pglink.build_procedure('NORTHWIND', 'northwind', 'Northwind', 'dbo', 'CustOrderHist',
      ARRAY[
        'ProductName', 'NVARCHAR',
        'Total', 'INT'
      ]
    );
  • Procedure return code and output parameters are not supported

  • Multiple resultsets are not supported

Using accessors

For each table or view created several functions and one updatetable view.

For each routine created one function.

Accessor functions for table/view

  • table$()

    Fetch all data rows from table or view (but see "Query filter" below).

    table is table or view name, with '$' suffix appended. Suffix prevents name clash of table and function accessors.

  • table$(where, param_values, param_types)

    where is literal WHERE clause, appended to SELECT statement that will be executed on remote database.

    This is effective way to limit number of fetched rows, but require forming of literal SQL with proper value quoting.

    By default, no SQL check performed, so it opens a security hole to remote database. You can add basic checks with optional adapter role CheckWhereClause, which requires SQL::Statement module. CheckWhereClause role pass only basic ANSI SQL statement.

    param_values is array of TEXT, that elements will be binded.

    param_types is array of TEXT.

Example:

--fetch all rows from table
SELECT * FROM northwind."Orders$"();

--fetch all rows (BAD) and apply local condition
SELECT * FROM northwind."Orders$"() WHERE "OrderID" = 1;

--apply condition on remote site and fetch only needed rows
SELECT * FROM northwind."Orders$"('WHERE OrderID = 1', NULL, NULL);

--same
SELECT * FROM northwind."Orders$"('WHERE OrderID = ?', ARRAY['1'], NULL);

--same
SELECT * FROM northwind."Orders$"('WHERE OrderID = ?', ARRAY['1'], ARRAY['integer']);


--stored procedure call
SELECT * FROM northwind."CustOrderHist"('VINET')

Accessor view

Accessor view built on accessor function without arguments and fetch all rows (but see "Query filter" below).

Example:

--fetch all rows
SELECT * FROM northwind."Orders";

--fetch all rows (BAD) and apply local filter
SELECT * FROM northwind."Orders" WHERE "OrderID" = 1;

Modification of table/view

Each table/view accessor view has rules that make it updatable.

UPDATE/DELETE queries first fetch all data rows from accessor view, than apply local WHERE condition, than post changes back row-by-row.

This is a major flaw both in dbi_link project and DBIx::PgLink.

Do not use SELECT, UPDATE, DELETE on large table without query filter!

INSERT queries do not fetch remote data and relatively fast (but 2-3x times slower than plain DBI).

Accessor has statement-level trigger that start remote transaction before local statement and commit it after.

There is unresolved problem with transaction coordination and error handling. Remote transaction rollback only if error occurs for row processing on remote site, but stalls if error occured at local database before local statement end. You need explicitly call SELECT dbix_pglink.rollback() in exception handler of your application. 2PC protocol is not used.

Each SQL statement stored in dbix_pglink.queries table and can be manually tuned (and SELECT statement as well).

Updated/deleted rows must be located by search key. If table has primary key and DBD can recognise it, than primary key columns used as search key. Otherwise search key contains all table columns.

Example:

INSERT INTO northwind."Orders" ("CustomerID", "EmployeeID", "OrderDate")
VALUES ('foo', 5, '2001-01-01');

--fetch all data (BAD), apply local condition
UPDATE northwind."Orders" SET "OrderDate" = '2001-01-01'
WHERE "OrderID" = 1;

--fetch all data (BAD), apply local condition
DELETE northwind."Orders" WHERE "OrderID" = 1;

Query filter

Currently, PostgreSQL don't have parametrized views and PL/Perl can't get text of SQL query that call function.

Query filter is a way to pass WHERE-clause to SELECT accessor (which used in UPDATE and DELETE too). It is not elegant solution, but do the job: update/delete of remote table does not require fetching of all table.

The worse disadvantage is that you need to repeat select criteria twice.

Each table accessor has two functions:

table_set_filter(where, param_values, param_types)

Remember supplied WHERE-clause and parameters. Arguments stored as session global variable and used until disconnect or reset.

table_reset_filter()

Clear stored WHERE-clause and parameters.

Note, that in UPDATE/DELETE operation filter is NOT applied to each modified row. Each row located by search key only. This is feature, not a bug: you can set broad "window" of remote data just to limit traffic and apply precise condition on PostgreSQL side.

Filter stored in global session variable for each accessor and not transactional.

Example:

--set filter (no actual data fetch)
SELECT northwind."Orders_set_filter"('WHERE EmployeeID = 5', null, null);

--now all SELECT/UPDATE/DELETEs work with subset of table with EmployeeID=5
SELECT * FROM northwind."Orders";

--update filtered rows with EmployeeID = 5
UPDATE northwind."Orders" SET "OrderDate"='2001-01-01'
WHERE "OrderID" = 1 and "EmployeeID" = 5; --double-check

--do the same
UPDATE northwind."Orders" SET "OrderDate"='2001-01-01'
WHERE "OrderID" = 1;

--remove filter
SELECT northwind."Orders_reset_filter"();

--now query returns whole table
SELECT * FROM northwind."Orders";

SEE ALSO

DBIx::PgLink::Manual::Install, http://pgfoundry.org/projects/dbi-link/

AUTHOR

Alexey Sharafutdinov <alexey.s.v.br@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2007 by Alexey Sharafutdinov

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.