NAME

SQL::Schema::Versioned - Routine and convention to create/update your application's DB schema

VERSION

This document describes version 0.238 of SQL::Schema::Versioned (from Perl distribution SQL-Schema-Versioned), released on 2021-08-08.

DESCRIPTION

To use this module, you typically run the "create_or_update_db_schema"() routine at the start of your program/script, e.g.:

use DBI;
use SQL::Schema::Versioned qw(create_or_update_db_schema);
my $spec = {...}; # the schema specification
my $dbh = DBI->connect(...);
my $res = create_or_update_db_schema(dbh=>$dbh, spec=>$spec);
die "Cannot run the application: cannot create/upgrade database schema: $res->[1]"
    unless $res->[0] == 200;

This way, your program automatically creates/updates database schema when run. Users need not know anything. See more concrete examples in the t/ directory of this distribution, or some applications that use this module like App::lcpan or SQLite::Counter::Simple.

BEST PRACTICES

It is recommended that after you create the second and subsequent version (upgrade_to_v2, upgrade_to_v3, and so on) you create and keep install_v1 so you can test migration from v1->v2, v2->v3, and so on.

FUNCTIONS

create_or_update_db_schema

Usage:

create_or_update_db_schema(%args) -> [$status_code, $reason, $payload, \%result_meta]

Routine and convention to create/update your application's DB schema.

With this routine (and some convention) you can easily create and update database schema for your application in a simple way using pure SQL.

Version: version is an integer and starts from 1. Each software release with schema change will bump the version number by 1. Version information is stored in a special table called meta (SELECT value FROM meta WHERE name='schema_version').

You supply the SQL statements in spec. spec is a hash which at least must contain the key latest_v (an integer) and install (a series of SQL statements to create the schema from nothing to the latest version).

There should also be zero or more upgrade_to_v$VERSION keys, the value of each is a series of SQL statements to upgrade from ($VERSION-1) to $VERSION. So there could be upgrade_to_v2, upgrade_to_v3, and so on up the latest version. This is used to upgrade an existing database from earlier version to the latest.

For testing purposes, you can also add one or more install_v<VERSION> key, where XXX is an integer, the lowest version number that you still want to support. So, for example, if latest_v is 5 and you still want to support from version 2, you can have an install_v2 key containing a series of SQL statements to create the schema at version 2, and upgrade_to_v3, upgrade_to_v4, upgrade_to_v5 keys. This way migrations from v2 to v3, v3 to v4, and v4 to v5 can be tested.

You can name install_v1 key as upgrade_to_v1 (to upgrade from 'nothing' a.k.a. v0 to v1), which is basically the same thing.

This routine will check the existence of the meta table and the current schema version. If meta table does not exist yet, the SQL statements in install will be executed. The meta table will also be created and a row ('schema_version', 1) is added. The (schema_summary, <SUMMARY>) row will also be added if your spec specifies a summary.

If meta table already exists, schema version will be read from it and one or more series of SQL statements from upgrade_to_v$VERSION will be executed to bring the schema to the latest version.

Aside from SQL statement, the series can also contain coderefs for more complex upgrade process. Each coderef will be called with $dbh as argument and must not die (so to signify failure, you can die from inside the coderef).

Currently only tested on MySQL, Postgres, and SQLite. Postgres is recommended because it can do transactional DDL (a failed upgrade in the middle will not cause the database schema state to be inconsistent, e.g. in-between two versions).

Modular schema (components)

This routine supports so-called modular schema, where you can separate your database schema into several components (sets of tables) and then declare dependencies among them.

For example, say you are writing a stock management application. You divide your application into several components: quote (component that deals with importing stock quotes and querying stock prices), portfolio (component that deals with computing the market value of your portfolio, calculating gains/losses), trade (component that connects to your broker API and perform trading by submitting buy/sell orders).

The quote application component manages these tables: daily_price, spot_price. The portfolio application component manages these tables: account (list of accounts in stock brokerages), balance (list of balances), tx (list of transactions). The trade application component manages these tables: order (list of buy/sell orders).

The portfolio application component requires price information to be able to calculate unrealized gains/losses. The trade component also needs information from the daily_price e.g. to calculate 52-week momentum, and writes to the spot_price to record intraday prices, and reads/writes from the account and balance tables. Here are the specs for each component:

# spec for the price application component
{
    component_name => 'price',
    summary => "Price application component",
    latest_v => 1,
    provides => ['daily_price', 'spot_price'],
    install => [...],
    ...
}

# spec for the portfolio application component
{
    component_name => 'portfolio',
    summary => "Portfolio application component",
    latest_v => 1,
    provides => ['account', 'balance', 'tx'],
    deps => {
        'daily_price' => 1,
        'spot_price'  => 1,
    },
    install => [...],
    ...
}

# spec for the trade application component
{
    component_name => 'trade',
    summary => "Trade application component",
    latest_v => 1,
    provides => ['order'],
    deps => {
        'daily_price' => 1,
        'spot_price'  => 1,
        'account'     => 1,
        'balance'     => 1,
    },
    install => [...],
    ...
}

You'll notice that the three keys new here are the component_name, provides, and deps.

When component_name is set, then instead of the schema_version key in the meta table, your component will use the schema_version.<COMPONENT_NAME> key. When component_name is not set, it is assumed to be main and the schema_version key is used in the meta table. The component summary, if specified, will also be written to schema_summary.<COMPONENT_NAME> key.

provides is an array of tables to help this routine know which table(s) your component create and maintain. If unset, this routine will try to guess from looking at "CREATE TABLE" SQL statements.

This routine will create table.<TABLE_NAME> keys in the meta table to record which components currently maintain which tables. The value of the key is <COMPONENT_NAME>:<VERSION>. When a component no longer maintain a table in the newest version, the corresponding table.<TABLE_NAME> row in the meta will also be removed.

deps is a hash. The keys are table names that your component requires. The values are integers, meaning the minimum version of the required table (= component version). In the future, more complex dependency relationship and version requirement will be supported.

This function is not exported by default, but exportable.

Arguments ('*' denotes required arguments):

  • create_from_version => int

    Instead of the latest, create from this version.

    This can be useful during testing. By default, if given an empty database, this function will use the install key of the spec to create the schema from nothing to the latest version. However, if this option is given, function wil use the corresponding install_v<VERSION> key in the spec (which must exist) and then upgrade using the upgrade_to_v<VERSION> keys to upgrade to the latest version.

  • dbh* => obj

    DBI database handle.

  • spec* => hash

    Schema specification, e.g. SQL statements to create and update the schema.

    Example:

    {
        latest_v => 3,
    
        # will install version 3 (latest)
        install => [
            'CREATE TABLE IF NOT EXISTS t1 (...)',
            'CREATE TABLE IF NOT EXISTS t2 (...)',
            'CREATE TABLE t3 (...)',
        ],
    
        upgrade_to_v2 => [
            'ALTER TABLE t1 ADD COLUMN c5 INT NOT NULL',
            sub {
                # this subroutine sets the values of c5 for the whole table
                my $dbh = shift;
                my $sth_sel = $dbh->prepare("SELECT c1 FROM t1");
                my $sth_upd = $dbh->prepare("UPDATE t1 SET c5=? WHERE c1=?");
                $sth_sel->execute;
                while (my ($c1) = $sth_sel->fetchrow_array) {
                    my $c5 = ...; # calculate c5 value for the row
                    $sth_upd->execute($c5, $c1);
                }
            },
            'CREATE UNIQUE INDEX i1 ON t2(c1)',
        ],
    
        upgrade_to_v3 => [
            'ALTER TABLE t2 DROP COLUMN c2',
            'CREATE TABLE t3 (...)',
        ],
    
        # provided for testing, so we can test migration from v1->v2, v2->v3
        install_v1 => [
            'CREATE TABLE IF NOT EXISTS t1 (...)',
            'CREATE TABLE IF NOT EXISTS t2 (...)',
        ],
    }

Returns an enveloped result (an array).

First element ($status_code) is an integer containing HTTP-like status code (200 means OK, 4xx caller error, 5xx function error). Second element ($reason) is a string containing error message, or something like "OK" if status is 200. Third element ($payload) is the actual result, but usually not present when enveloped result is an error response ($status_code is not 2xx). Fourth element (%result_meta) is called result metadata and is optional, a hash that contains extra information, much like how HTTP response headers provide additional metadata.

Return value: (any)

FAQ

Why use this module instead of other similar solution?

Mainly simplicity. I write simple application which is often self-contained in a single module/script. This module works with embedded SQL statements instead of having to put SQL in separate files/subdirectory.

How do I see each SQL statement as it is being executed?

Try using Log::ger::DBI::Query, e.g.:

% perl -MLog::ger::DBI::Query -MLog::ger::Output=Screen -MLog::ger::Level::trace yourapp.pl ...

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/SQL-Schema-Versioned.

SOURCE

Source repository is at https://github.com/perlancar/perl-SQL-Schema-Versioned.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=SQL-Schema-Versioned

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

SEE ALSO

Some other database migration tools on CPAN:

  • DBIx::Migration

    Pretty much similar to this module, with support for downgrades. OO style, SQL in separate files/subdirectory.

  • Database::Migrator

    Pretty much similar. OO style, SQL in separate files/subdirectory. Perl scripts can also be executed for each version upgrade. Meta table is configurable (default recommended is 'AppliedMigrations').

  • sqitch

    A more proper database change management tool with dependency resolution and VCS awareness. No numbering. Command-line script and Perl library provided. Looks pretty awesome and something which I hope to use for more complex applications.

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTOR

Steven Haryanto <sharyanto@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2021, 2018, 2017, 2015, 2014, 2013 by perlancar@cpan.org.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.