NAME

SQL::Yapp - SQL syntax in Perl with compile-time syntax checks

SYNOPSIS

use SQL::Yapp
    qw(
        dbh
        quote
        quote_identifier
        check_identifier
        xlat_catalog
        xlat_scheme
        xlat_table
        xlat_column
        xlat_charset
        xlat_collate
        xlat_constraint
        xlat_index
        xlat_transcoding
        xlat_transliteration
        ASTERISK
        QUESTION
        NULL
        TRUE
        FALSE
        UNKNOWN
        DEFAULT
    ),
    marker => 'sql';

In the use clause, you usually pass a function returning a DBI handle:

my $dbh;
use SQL::Yapp dbh => sub { $dbh };

The handle must be initialised before any SQL expression can be evaluated. There are other ways to use the module, see the chapter "Initialisation" below.

$dbh= DBI->connect(...);

You can use SQL syntax natively in Perl now, without worrying about quotation or SQL injections. The SQL objects can be used directly in DBI calls:

my $first_name= "Peter";
my $dbq= $dbh->prepare(sql{
    SELECT surname FROM customer WHERE first_name = $first_name
});
$dbq->execute();

The interpolated SQL looks much like a do{...} block. Depending on context, different quotation for interpolated Perl strings is used. Here's an example for a column name:

my $column= 'surname';
my $q= sql{
    SELECT customer.$column FROM customer WHERE first_name = 'John'
};

Perl code can be used everywhere by using {...} inside SQL code:

my $sur= 1;
my $dbq= $dbh->prepare(sql{
    SELECT .{ $sur ? 'surname' : 'first_name' } FROM customer
});

Arrays of values are expanded in lists:

my @val= ( 1, 2, 3 );
my $dbq= $dbh->prepare(sql{
    SELECT @val
});

Arrays of column names are expanded in lists:

my @col= ( 'surname', 'first_name' );
my $dbq= $dbh->prepare(sql{
    SELECT .@col FROM customer
});

Table names, too:

my @tab= ( 'friends', 'enemies' );
my $dbq= $dbh->prepare(sql{
    SELECT @tab.surname FROM @tab
});

Even multiple expansion is possible:

my $dbq= $dbh->prepare(sql{
    SELECT @tab.@col FROM @tab
});

Embedding is fully recursive: you can have SQL in Perl in SQL in Perl...

my $dbq= $dbh->prepare(sql{
    SELECT surname FROM customer
    WHERE
       {$sur ?
           sql{ surname    LIKE '%foo%' }
       :   sql{ first_name LIKE '%bar%' }
       }
});

SQL structures of different kinds can be parsed and stored in Perl handles and used in other SQL structures:

$expr= sqlExpr{         (b * 6) = COALESCE(c, d)        };
$tab=  sqlTable{        bar                             };
$col=  sqlColumn{       $tab.name                       };
$join= sqlJoin{         LEFT JOIN foo ON $col == foo.id };
@ordr= sqlOrder{        a, b DESC                       };

$stmt= sqlStmt{         SELECT $col
                        FROM $tab
                        Join $join
                        WHERE $expr
                        ORDER BY @ordr    };

$type= sqlType{         INT(10) };
$spec= sqlColumnSpec {  $type NOT NULL DEFAULT 17 };
@to=   sqlTableOption{  ENGINE innodb
                        DEFAULT CHARACTER SET utf8
                     };
$stm2= sqlStmt{         CREATE TABLE foo ( col1 $spec ) @to };

Hash interpolation in SET clauses is supported:

my %new_value= (
    first_name => 'John',
    surname    => 'Doe'
);
my $dbq= $dbh->prepare(sql{
    UPDATE customer SET %new_value
    WHERE age >= 18
});

Array interpolation in SET clauses is also supported:

my @new_value= (
    sqlExpr{ first_name = ?      },
    sqlExpr{ surname    = 'Doe'  }
);
my $dbq= $dbh->prepare(sql{
    UPDATE customer SET @new_value
    WHERE age >= 18
});

DESCRIPTION

The purpose of this module is to provide a means to make SQL injections totally impossible, and to provide an easy, native SQL interface in Perl. These two goals go hand in hand: by embedding a full SQL parser in the Perl compiler, forcing proper quotation is easy.

This package also provides basic compile-time syntax checking of SQL.

Currently, the major goals are security and ease of use, rather than completeness or efficiency. We'll add more SQL syntax over time to make this more and more complete. So for some things, you'll still need the raw DBI interface.

Initialisation

This package needs a way to quote identifiers and values for proper SQL output. One way of providing this is by a DBI handle:

my $dbh;
use SQL::Yapp dbh => sub { $dbh };

In libraries, where you only what to parse SQL, you do not need to specify the DBI handle, but you may simply use:

use SQL::Yapp;

This enables the compile-time SQL parser for the given file. You only need to specify the the DBI handle if you want to stringify the parsed SQL objects. The SQL parser works without it.

You can set and change the link later, too:

SQL::Yapp::dbh { $dbh };

By settings a DBI handle, the library auto-implements the two required quotation functions, one for values and one for identifiers. Instead of passing the DBI handle reference, you can alternatively implement your own quotation functions:

use SQL::Yapp;
my $dbh;

SQL::Yapp::quote            { $dbh->quote($_[0])         };
SQL::Yapp::quote_identifier { $dbh->quote_identifier(@_) };

This make this package independent from the DBI module. You can also define these functions directly in the package invocation:

my $dbh;
use SQL::Yapp
   quote            => sub { $dbh->quote($_[0])         },
   quote_identifier => sub { $dbh->quote_identifier(@_) };

A fancy package option is 'marker', which defines to which string the package reacts in your Perl script. The default is 'sql', so sql{...} encloses SQL blocks. You might want to use something different:

use SQL::Yapp marker => 'qqq';

Now, your SQL commands need to be embedded in qqq{...} instead. The prefix is used for other kinds of embedding, too, e.g. qqqExpr{...}.

You cannot dynamically change the marker, but only set it in the package initialisation, because it is needed at compile time.

The following functions are importable from SQL::Yapp:

dbh
quote
quote_identifier
check_identifier
xlat_catalog
xlat_schema
xlat_table
xlat_column
xlat_charset
xlat_collate
xlat_constraint
xlat_index
xlat_transcoding
xlat_transliteration

You may pass these names in the use clause to import them just like in the initialisation of any other module:

use SQL::Yapp qw(quote_identifier);

You may also mix this with initialisation described above:

use SQL::Yapp qw(quote_identifier), marker => 'qqq';

All the exported functions get/set parameters of the library and their values can be set in the use clause already just like in the above examples. The xlat_* function family is described in Section "Identifier Name Translation".

The function check_identifier is described in Section "Identifier Checking".

Another set of initialisation options selects the accepted dialect and the normalisation mode. The options are:

read_dialect  => [ 'dialect' , ... ],
write_dialect => 'dialect',
dialect       => 'dialect'

The dialect option is an abbreviation for using read_dialect and write_dialect with the same value.

The read_dialect defines from which SQL dialect to accept incompatible extensions (compatible extensions are always accepted and normalised).

The write_dialect defines for which dialect to produce output. Currently the following dialects are known:

'mysql'
'postgresql'
'oracle'
'std'
'generic'

'generic' means to try to please everyone while 'std' means to try to please no-one, i.e., to stick to the standard. :-)

The read_dialect option must be given in initialisation, because they must be known at compile time. The write_dialect option may be set before SQL expressions are evaluated (and thus stringified into SQL syntax).

For information, what normalisation is done, please refer to Section "Normalisation".

Finally, there's the debug boolean option, which, when set to 1, will dump the compiled code to STDERR. This is for developers.

To summarise: the following configuration options exist:

marker
dbh
quote
quote_identifier
check_identifier
xlat_catalog
xlat_schema
xlat_table
xlat_column
xlat_charset
xlat_collate
xlat_constraint
xlat_index
xlat_transcoding
xlat_transliteration
catalog_prefix
schema_prefix
table_prefix
column_prefix
constraint_prefix
read_dialect
write_dialect
dialect
debug

For the _prefix options, also see Section "Identifier Name Translation".

For programs that do not know in advance how to connect to SQL, it is also infeasible to set dbh in the use clause. The SQL parser/preprocessor of the library still works, so you can do without problems:

use SQL::Yapp;

sub get_select()
{
    return sql{ SELECT * FROM mydb };
}

Without setting the DB handle, the expressions the preprocessor generates cannot be stringified and executed, because the library does not know how to quote properly.

Basic Syntax and Usage

The embedded SQL syntax is based on normal SQL syntax, with interpolations of Perl values made easy. In Perl, an SQL expression is enclosed in an sql{...} block, like so:

my $query= sql{SELECT foo FROM bar};

The result is a list of blessed references, enclosed in a do{...} block.

The above $query automagically stringifies to SQL syntax when embedded in a string, e.g.:

"$query"

will return a string suitable for DBI interface. So you can use this with DBI as follows:

my $dbq= $dbh->prepare(sql{SELECT foo FROM bar});

Note again: the result of sql{...} is a list. So if you have multiple statements in your sql{...} block, you get multiple results. This way, the structure embeds nicely into Perl using Perl native concepts:

my @query= sql{SELECT foo FROM bar ; SELECT foz FROM baz};

In this example, @query has 2 elements, each an SQL select statement object. It is effectively the same as:

my @query= (sql{SELECT foo FROM bar}, sql{SELECT foz FROM baz});

In scalar context, it is wrong to try to assign multiple values:

my $query= sql{SELECT foo FROM bar ; SELECT foz FROM baz};
    # ERROR: cannot assign multiple results to scalar

The SQL::Yapp syntax is a bit different from standard SQL syntax. The most important thing is that table and column names can only be lower case, unless escaped with ``, and that keywords must be upper case, and that all literal values and comments use Perl syntax instead of SQL syntax.

As already mentioned, an sql{...} block expands to a do{...} block. This is important in some places and was mainly implemented this way because of the similar look of sql{...} and do{...}. It has some consequences:

Firstly, you cannot directly index the result but must put parentheses around the block do to that:

my $second= (sqlExpr{ 1, 2, 3})[1];

Secondly, you can use sql{...} in places where you might otherwise get a surprising effect if we had used parentheses for enclosing, e.g.:

my @q= map sql{ SELECT .$_ FROM tab }, @col;

With parens, map (...), @col would produce a syntax error, but map do{...}, @col is fine.

Tokens

UPPER_CASE

SQL keywords and function names:

SELECT, FROM, MAX, SUM, ...
lower_case

Names: tables, columns, variables, you name it:

client_address, surname, ...
CamelCase

Type names that may precede Perl interpolations to indicate the intended item they store. Usually, such a type is inferred from context, but sometimes they are needed:

SELECT Column $a FROM ...
Other Identifier

Any other identifier with mixed case (or no letters at all) will raise a syntax error.

backquoted string: `...`

Quoted name, no escape characters allowed, may not contain newlines. Needed for names (tables, columns, etc.) that are not all lowercase.

Numbers

Numbers basically use Perl syntax: 99, 0xff, 077, 0b11, 0.9e-9

Strings

Again, these use Perl syntax. Singly and doubly quoted strings are supported. Backquoted strings are not directly supported, because they are used for identifiers already and are needed by SQL directly. You can use {`...`} instead, however.

'blah'

Singly quoted string.

"blah"

Doubly quoted string.

Symbols

These SQL syntax elements act like keywords, but are symbolic:

(, ), {, }, ...
Comments

The syntax is the same as in Perl:

# this is a comment

Deliberate Restrictions

  • Identifiers containing $ or # characters must be quoted with `...`. This is because $ and # interfere with Perl syntax.

  • In an ExprAs object, 'AS' is mandatory, just like in good SQL programming practice:

    Bad here (but works in plain SQL)
    SELECT a b FROM c;
    Good
    SELECT a AS b FROM c;
  • Table and column names may not contain newline characters.

Differences

As mentioned above, strings, numbers and comments follow Perl syntax. This change was done for more elegent embedding of SQL into Perl. It also helps syntax highlighting... E.g., you can naturally use string interpolations, e.g.:

my $x= "'test";  # most be quoted properly to work!
my $y= sql{
    SELECT "difficult: $x"
};

Extensions

  • Numbers may contain _ just like in Perl, e.g. 1_000_000.

  • There are binary numbers: 0b11 == 3.

  • LIMIT clauses are parsed in both MySQL and PostgreSQL format and always generated in PostgreSQL format, i.e.:

    sql{SELECT ... LIMIT 5, 2}

    will stringify as:

    SELECT ... LIMIT 2 OFFSET 5

Missing Error Checking

In some places, this package does not fully check your SQL code at compile time, usually for two reasons: (a) to make the code of the preprocessor easiler, (b) to keep the number of possible syntax structures and object types low for the user.

Column/Row Functions In Expressions

Expressions allow more or less types depending on where they are used. The SQL grammar distinguishes them accordingly, but we do not. E.g. count(*) cannot use in SQL in a WHERE clause, but we don't check that but leave it to your data base server.

I am sure there's more that could be documented here.

Perl Interpolation

The basic construct for embedding Perl code in SQL is with braced code blocks:

sql{
    SELECT foo FROM bar WHERE
        { get_where_clause() }
}

Interpolation of Perl is triggered by $, @, %, "..." and {...} in embedded SQL. The syntax of such expressions is just like in Perl. All but {...} behave just like Perl; {...} is not an anonymous hash, but equivalent to a do{...} block in Perl. Inside "..." strings, you can also use Perl interpolation.

E.g. the following forms are the same:

my $greeting= 'Hello World';
my $s1= sql{ SELECT {$greeting} };    # general {...} interpolation
my $s2= sql{ SELECT $greeting   };    # direct $ interpolation
my $s3= sql{ SELECT "$greeting" };    # direct string interpolation

When parsing SQL expressions (i.e., values), it is unclear whether a string or a column is used. In that case, a string is used. If you mean to interpolate a column name, use a single dot in front of your interpolation (this single dot is special syntax, and the final SQL string will not contain that dot, but be proper SQL syntax):

my $s1= sql{ SELECT blah.$x };        # unambiguous: $x is a column name
my $s2= sql{ SELECT $x.blah };        # unambiguous: $x is a table name
my $s3= sql{ SELECT "$x" };           # unambiguous: "..." is always a string
my $s4= sql{ SELECT $x };             # ambiguous: could be string or column,
                                      #   => we resolve this as a string.
my $s5= sql{ SELECT .$x };            # unambiguous: $x is a column name
                                      #   (the dot is special syntax)
my $s6= sql{ SELECT ."foo$x" };       # unambiguous: "foo$x" is a column name

For the complete description of the syntax, see "<Perl>".

It is impossible to interpolate raw SQL in a string with this module, since everything is parsed and thus syntax-checked. That's the whole point: we want guarantees that SQL injections are impossible, so we won't jeopardise this by letting arbitrary raw strings to be injected.

However, this module allows fully recursive embedding, i.e., it allows the use of sql{ ... } within the embedded Perl code. Like so:

sql{
    SELECT foo FROM bar WHERE
        {$type eq 'a' ?
            sql{foo >= 2}
        :   sql{foo <= 1}
        }
}

All sql{...} blocks inside the embedded Perl code will not parse a statement list, but an expression, because the {...} is inside the 'WHERE' clause. This means that sql{...} is context-dependent.

On top-level, sql{...} it is equivalent to sqlStmt{...}. In the example above, it is equivalent to sqlExpr{...}, because it is inside a WHERE clause, where expressions are expected. You can construct SQL expressions, too, by changing the default:

my $expr1= sqlExpr{ foo >= 2 };
my $expr2= sqlExpr{ foo <= 1 };
sql{
    SELECT foo FROM bar WHERE
        {$type eq 'a' ?
            $expr1
        :   $expr2
        }
}

Note: Type checking of interpolations will be done at run-time. So the following only fails at run-time, not compile time:

In the above example, this module has no way of knowing: (a) that the ?: operator yields inconsistent kinds of SQL things, (b) that the embedded Perl expression may return sqlStmt. So the case that sqlStmt is returned only fails at run-time.

my $q= sql{
    SELECT foo FROM bar WHERE
    {$is_large ?
        sqlStmt{UPDATE foz SET x=5 WHERE name=''}
    :   sqlExpr{test > 5}
    }
};

Actually, this means that you have the same dynamic type checking that Perl has. The above only fails when $is_large is true. And the following is, maybe surprisingly, correct (both for for true and false values of $is_large):

my $q= sql{
    SELECT foo FROM bar WHERE
    {$is_large ?
        sqlStmt{SELECT foz FROM baz}
    :   sqlExpr{test > 5}
    }
};

This is correct, because a select statement can be used as a value, and thus in a where clause.

The default interpretation of sql{...} inside interpolations may be wrong for complex Perl code. The default the local context where the interpolation starts, so after WHERE, the default is Expr:

my $q= sql{
   SELECT foo FROM bar WHERE
   {
       my $subquery= sql{
           SELECT foz FROM baz   # <--- SYNTAX ERROR, because sql{...} is
       };                        #      parsed as sqlExpr{...} here, since
                                 #      the Perl interpolation follows
                                 #      WHERE.  You probably want to use
                                 #      sqlStmt{...} here.
       ...
   }
};

Depending on context, embedded Perl is evaluated in scalar or in list context. Inside SQL lists, the embedded block will be evaluated in list context:

my $q= sql{
    SELECT { code1 }
};

code1 will be evaluated in list context, and each result will be one value of the SELECT statement.

Furthermore, arguments of some binary operators, namely +, *, AND, OR, XOR, ||, and arguments to any function are evaluated in list context. Each element of the list becoming one operand:

my @a= (1,2,3);
my $q= sql{
    SELECT 0 + @a
}

This selects 0+1+2+3.

Finally, in the above positions, many unary operators, namely -, NOT, and any operator starting with IS ..., will 'pass-through' list context, and are evaluated point wise. E.g.:

my $q=sql{
    SELECT 0 AND NOT(@a)
};

This will become 0 AND (NOT 1) AND (NOT 2) AND (NOT 3).

In all other situations, values are evaluated in scalar context. Here's an example of scalar context:

my $q= sql{
    SELECT name AS { code2 }
};

Here, code2 will be evaluated in scalar context, because only one single identifier can be used in the AS clause.

Some list interpolations allow syntactic hashes and then do something special with them. This means that hashes usually behave differently in list context depending on whether you write them as %a or { %a }. The former may have special meaning to embedded SQL (see below), while the latter has Perl meaning, listing the hash as a list, interleaving keys and values.

Note that in contrast to Perl, syntactic arrays are not allowed in scalar context. E.g. the following code is wrong:

my @a= ...
my $q= sql{
    SELECT name FROM customer WHERE @a  # <--- ERROR
};

This is to prevent bugs. In plain Perl, @a would evaluate to the number of elements in @a, but in embedded SQL, this is an error. If you really mean it, use { @a } instead or ${\scalar(@a)}.

In the same way as syntactic arrays, syntactic hashes are not allowed in scalar context. If you think you must use them for some reason, use ${\scalar(%a)}.

Statement Interpolation

In statements or statement lists, embedded Perl code must be a blessed 'sqlStmt' object. E.g.:

my $q= sql{
    SELECT foo FROM bar
};
my $q2= sql{
    $q
};

The above code is effectively the same as:

my $q2= $q;

Less trivially, you can interpolate statements as subqueries, too.

Multiple interpolation works, too:

my @q= sql{
    SELECT foo FROM bar ;
    SELECT foz FROM baz
};
my @q2= sql{
    @q
};

Again, this is effectively the same as:

my @q2= @q;

For the syntax of statements, see "<Stmt>".

Join Interpolation

E.g. in a SELECT statement, you can use a Join clause you keep in a variable:

my $join= sqlJoin{ NATURAL INNER JOIN foo };
sql{ SELECT name FROM bar Join $join WHERE ... };

For the syntax of join clauses, see "<Join>".

Join interpolation only accepts Join objects, nothing else, so they must have been constructed with sqlJoin{...}. You may use lists:

my @join= (
    sqlJoin{ NATURAL INNER JOIN foo },
    sqlJoin{ LEFT JOIN baz USING (a) }
);
sql{ SELECT name FROM bar Join @join WHERE ... };

Be advised to use the typecast Join before Join interpolations, because there is no single keyword to start the block of JOIN clauses in an SQL statement, so you might run into ambiguities. Using Join makes the situation unambiguous to the parser.

Expression Interpolation

For the syntax of expressions, see "<Expr>".

In expressions or expression lists, embedded Perl code looks like the following:

my $q= sql{
    SELECT { code3 }
};

It may may return the following types of objects:

plain number, plain strings

These will be assumed to be constant values in SQL. Therefore, these will be quoted using the quote() function.

sqlExpr objects

Such objects will be interpolated as a complex tree, so you can create and reuse them:

my $expr= sqlExpr{ age + 5 };
my $q= sql{
    SELECT $expr FROM customer
};

In expression lists, the Perl code may return multiple values that will be passed as multiple things in SQL. Each element may be one of the above items and handled accordingly.

Array Interpolation in Expressions

Functions allow interpolation of arrays for their parameters:

CONCAT(@a,@b,'test')

The operators +, *, AND, OR, XOR, and || allow interpolation of arrays:

5 * @a

With @a=(1,2,3) will translate to something like:

5 * 1 * 2 * 3

If you want to multiply nothing else but the values in @a, simply use an empty list to construct the syntactic context needed for the operator:

{} * @a

This expands to:

1 * 2 * 3

This interpolation is especially handy for constructing WHERE clauses with the AND operator, e.g.:

WHERE {} AND %cond

With %cond=( a => 1, b => 2, c => 3 ), this expands to:

WHERE a = 1 AND b = 2 AND c = 3

All of these functions also work with zero parameters, i.e., {}*{} and {}AND{} will expand to 1, while {}+{} and {}OR{} will expand to 0.

Hash Interpolation in Expressions

As alreay indicated, if hash interpolation is used in expression list context, such hashes are turned into lists of equations. The hash keys will be quoted with quote_identifier(), the hash values may be one of the things described above. Each key-value pair will expand to an expression:

`key` = value

This kind of interpolation is especially handy together with operators like AND and + which allow list context expansion, as described in the previous section. For example:

my %cond= ( age => 50, surname => 'Doe' );
my $q= sql{
    SELECT ... WHERE {} AND %cond
};

Will expand to something like:

SELECT ... WHERE `age` = 50 AND `surname` = 'Doe'

Hash interpolations of this kind can also be used in SET clauses.

Interpolation of Unary Operators

A pair of parenthesis (...), the prefix operators except + (i.e., - and NOT) and all suffix operators (e.g. IS NOT NULL) will expand point-wise:

my @col= ( 'name', 'age' );
my $q= sql{
    SELECT ... WHERE {} AND (.@col IS NOT NULL)
};

It will expand to something like:

SELECT .. WHERE `name` IS NOT NULL AND `age` IS NOT NULL

This behaviour is just like that of DESC and ASC in the ORDER BY operations, see "DESC clause"" in "Interpolation in ASC.

Because + is also a list-context infix operator, and because its purpose as prefix operator is very limited, it was felt that it is too confusing to let it operate point-wise. So the following is an error:

my @val= (1,2,3);
my $q= sql{ SELECT +@val };  # <--- currently an ERROR

However, the infix operator

my @val= (1,2,3);
my $q= sql{ SELECT {} + @val };

will expand to:

SELECT 1 + 2 + 3;

Expression List Interpolation

Sometimes, lists of expression occur, e.g. the IN operator or the INSERT ... VALUES statement. In these cases, you may interpolate array references. The following two SQL statements are the same:

my $a= [1,2];
my $q= sql{
    SELECT 5 IN (@$a) ;
    SELECT 5 IN $a
};

In scalar context above, this is not that useful, because for $a you have the equivalent form (@$a), but in list context, it is useful:

my @a= ([1,2], [2,3]);
my $q= sql{
    INSERT INTO t (x,y) VALUES @a
};

This expands to:

INSERT INTO t (`x`, `y`) VALUES (1,2), (2,3)

Please note that Perl reference syntax (i.e., a backslash) does not trigger Perl interpolation, so the following is wrong:

my @a= (1,2);
my $q= sql{
    SELECT 5 IN \@a   # <--- ERROR: \@a is no Perl interpolation
};

Expression Interpolation and AS clause

If an AS clause follows an expression interpolation, the expression will be evaluated in scalar context, i.e, there may be maximally one expression, not a list. E.g. the following is wrong:

my @col= ('x', 'y');
my $q=sql{
    SELECT .@col AS name    # <--- ERROR: @col not allowed with AS
};

It makes no sense to apply AS name to both elements of @col, so it is not allowed. Without AS, the clause does support array interpolation, of course:

my $q=sql{
    SELECT .@col      # <--- OK, will become: SELECT `x`, `y`
};

Type Interpolation

Types can be stored in Perl variables:

my $t1= sqlType{ VARCHAR(50) };

Types can be easily extended:

my $t2= sqlType{ $t1 CHARACTER SET utf8 };

This is equivalent to:

my $t2= sqlType{ VARCHAR(50) CHARACTER SET utf8 };

You can also remove specifiers, i.e., do the opposite of extending them, with a syntax special to this module starting with DROP:

my $t1b= sqlType{ $t2 DROP CHARACTER SET };

This makes $t1b the same type as $t1.

To allow modification of types already constructed and stored as a Perl object, type attributes or base types can be changed by simply listing them after the base type or interpolation. Any new value overrides the old value, e.g. to change the size:

my $t3= sqlType{ $t1 (100) };

This is equivalent to:

my $t3= sqlType{ VARCHAR(100) };

You can even change the base type, keeping all other attributes if they are sensible. Any attributes not appropriate for the new base type will be removed:

my $t4= sqlType{ $t2 DECIMAL };

This is equivalent to:

my $t4= sqlType{ DECIMAL(50) };

The character set attribute has silently been removed. If you change the base type again, it will not reappear magically.

my $t5= sqlType{ $t4 CHAR };

This is equivalent to:

my $t5= sqlType{ CHAR(50) };

Note how the character set was removed.

In list context, modifications made to an array Perl interpolation will affect all the elements:

my @t1= sqlType{ CHAR(50), VARCHAR(60) };
my @t2= sqlType{ @t1 (100) };

This is equivalent to:

my @t2= sqlType{ CHAR(100), VARCHAR(100) };

See also "<Type>".

ColumnSpec Interpolation

ColumnSpec interpolation is very similar to Type Interpolation, i.e., just like types, you can modify ColumnSpec objects by simply suffixing constraints or type attributes or base types.

See also "<ColumnSpec>".

Table Interpolation

Table objects represent fully qualified table specifications and may include catalog, schema and table name information.

For the syntax of table specifications, see "<Table>".

The interpolation of Tables is simple: either it is a Table object generated with sqlTable{...} or it is a simple string. In list context, a list of such values may be used.

my @tab= ( 'foo', 'bar' );
my $q= sql{
    SELECT name, id FROM @tab
};

Schemas and catalogs are supported. For them to work, your data base needs to support them as well. The full input syntax for a table specification is:

[ [ catalog . ] schema . ] table

These three components will always be passed to quote_identifier() together as three parameters, suitable for DBI. As mentioned, and sqlTable object may hold a complete table specification:

my $tabspec= sqlTable{ cata.schem.tab };
my $q= sql{
    SELECT name FROM $tabspec
};

A table specification can be used to qualify a column name, of course:

my $q= sql{
    SELECT $tabspec.name FROM ...
};

The following is wrong, because a table specification cannot be qualified further:

my $q= sql{
    SELECT name FROM $tabspec.other  # <--- ERROR!
};

Column Interpolation

Column objects are used in expressions. They are fully qualified column specifications and may include table, schema, catalog, and column information.

For the syntax of column specifications, see "<Column>".

The are two types of Column interpolations: one element vs. multi element.

For one element Columns, embedded Perl code may return sqlColumn{...} objects or strings:

my @col= ('name', sqlColumn{age});
my $q= sql{
    SELECT .@col
};

The above prefixed . is syntactic sugar. More generally, you can esplicitly request expansiong of @col as Column objects:

my $q= sql{
    SELECT Column @col
};

For multi element Columns, only strings or sqlExpr{*} (in Perl, there is the constant SQL::Yapp::ASTERISK for this) are allowed in an interpolation, be cause a column specification cannot be qualified further:

my $q= sql{
    SELECT mytable.@col   # <-- none of @col may be sqlColumn
};

In a list context, multiple column specifications are allowed, as already shown in the previous examples. Each part of the column specification may be a list and will be expanded multiply:

my $q= sql{
    SELECT @tab.@col
};

This will expand to the following:

SELECT $tab[0].$col[0], ... , $tab[0].$col[n],
       $tab[1].$col[0], ... , $tab[1].$col[n],
       ..., $tab[m].$col[n]

For syntactic hashes in list context, each hash's keys will be used, e.g.:

my %col= ( 'surname' => 1, 'first_name' => 2 );
my $q= sql{
    SELECT .%col
};

This will expand to something like:

SELECT `surname`, `first_name`;

Note that key interpolation order is non-deterministic, so you might also end up with:

SELECT `first_name`, `surname`;

For this reason, it is usually not a good idea to use multi-place hash interpolation, because the result columns will get identical names and due to non-determinism, you don't know the result column order either:

my $q= sql{
    SELECT %tab.%col  # <--- works, but is usually not useful
};

GROUP BY / ORDER BY Interpolation

For the syntax of Order clauses, see "<Order>".

In any place where a list of order clauses can be listed inside a GROUP BY or ORDER BY clause, the whole clause is dropped if the list is empty:

my @a= ();
my $q= sql{
    SELECT foo FROM bar GROUP BY @a;
};

This will expand so something like:

SELECT foo FROM bar

Perl interpolation of strings, except "..." interpolation, generates column names instead of plain strings in order position. To force interpretation as a string, use "..." interpolation. This is different from Expr, which defaults to string interpretation and needs you to use a single dot to force column name interpretation. Compare the following examples:

my $a= 'a';
print sqlOrder{ $a }."\n";      # $a is a column name
print sqlOrder{ .$a }."\n";     # $a is a column name
print sqlOrder{ "$a" }."\n";    # $a is a string
print sqlExpr{ $a }."\n";       # $a is a string
print sqlExpr{ .$a }."\n";      # $a is a column name
print sqlExpr{ "$a" }."\n";     # $a is a string

So this produces the following output (the quotation depends on used DB):

`a`
`a`
'a'
'a'
`a`
'a'

I.e., sqlOrder (just like sqlColumn) produces an identifier, while sqlExpr produces a string literal.

Also note that hash interpolation behaves the same as for sqlColumn, namely on the hash keys. However, due to the non-deterministic nature of hash enumeration, the whole point of 'ordering' becomes absurd:

my %a= ( a => 1, b => 1 );
my $q= sql{
    SELECT a, b FROM t ORDER BY %a
};

Depending on Perl's mood, this becomes either:

SELECT `a`, `b` FROM t ORDER BY `a`, `b`

or it becomes:

SELECT `a`, `b` FROM t ORDER BY `b`, `a`

For GROUP BY, this is still helpful, so this kind of interpolation is supported:

my %a= ( a => 1, b => 1 );
my $q= sql{
    SELECT a, b, c FROM t GROUP BY %a
};

Interpolation In ASC/DESC Clause

If an ASC/DESC keyword follows a list interpolation, it is used for each of the elements of the list. For example:

my @col= ('x', 'y');
my $q=sql{
    SELECT ... ORDER BY @col DESC
};

This is valid (if you fill in valid code for ...) and similar to:

my $q=sql{
    SELECT ... ORDER BY x DESC, y DESC
};

This even works if the elements are themselves Order objects that carried an ASC or DESC modifier: the direction will either be kept (in case of an additional ASC) or swapped (in case of an additional DESC):

my @order= sqlOrder{ a DESC, b ASC };
my $q= sql{
    SELECT ... GROUP BY @order ORDER BY @order DESC
};

This will expand to something like:

SELECT ... GROUP BY a DESC, b ORDER BY a, b DESC

(A suffixed ASC is not printed since it is the default.)

LIMIT Interpolation

Perl code in LIMIT clauses may return a number or undef. Specifying an offset but no count limit is not directly supported, so we will generate a very large count limit in that case, hoping that the data base server can handle that. Example:

sql{ SELECT ... LIMIT 10, {undef} }

This will stringify as:

SELECT ... LIMIT 18446744073709551615 OFFSET 10

Note that LIMIT clauses are not standardized.

MySQL
LIMIT cnt
LIMIT offset, cnt
LIMIT cnt OFFSET offset    "for PostgreSQL compatibility"
PostgreSQL
LIMIT cnt
LIMIT cnt OFFSET offset
LIMIT ALL
LIMIT ALL OFFSET offset
OFFSET offset

We support all of MySQL and all of PostgreSQL as input syntax and will always produce:

LIMIT cnt OFFSET offset

So this is automatically made compatible for the two data base types.

However, this is not enough to support all kinds of data bases. Other syntaxes include:

Oracle
... WHERE rownum >= offset && rownum < (cnt + offset)

This is not automatically generated from LIMIT ..., but you can write it manually yourself, of course, because rownum is a normal identifier.

MS
SELECT TOP cnt ...

This is not yet supported.

Identifier Name Translation

This package allows you to modify all identifiers before they are quoted. This allows you, e.g., to set a common prefix for all table names. The following functions modification handlers:

xlat_catalog
xlat_schema
xlat_table
xlat_column
xlat_charset
xlat_collate
xlat_constraint
xlat_index
xlat_trancoding
xlat_transliteration

For example:

SQL::Yapp::xlat_table { 'foo_'.$_[0] }

Would prefix all table names with foo_. I mean every table name, mind you. Because the library knows about the whole SQL structure and parses everything, the quotation works throughout: for literal as well as all Perl interpolations. For example:

my $q= sql{
    SELECT name FROM customer
};

This would be expanded (depending on how quote_identifier() quotes) similar to:

SELECT `name` from `foo_customer`;

The same is achieved with the following:

my $table= 'customer';
my $q= sql{
    SELECT name FROM $table
};

Note that the package cannot distinguish aliases and real table names, so the following is modified more than you might expect (which usually does not hurt, but you should know):

my $q= sql{
    SELECT c.name FROM customer AS c
};

This results in:

SELECT `foo_c`.`name` FROM `foo_customer` as `foo_c`

You can specify such modifications in the use statement already:

use SQL::Yapp xlat_table => sub { 'foo_'.$_[0] };

Simple prefixing can be achieved by convenience options for columns, tables, schemas, and catalogs, so you don't need to use xlat_ options, but can write more readably:

use SQL::Yapp table_prefix => 'foo_';

The following convenience options exist:

catalog_prefix
schema_prefix
table_prefix
column_prefix
constraint_prefix

These convenience options simply define the corresponding xlat_ function appropriately.

Identifier Checking

You might want to check for typos in column and table names at compile-time. This can be done in a very general way by using the check_identifier callback function. You can set it as follows:

sub my_check_identifier($$$$;$)
{
    my ($kind, $catalog, $schema, $ident1, $ident2)= @_;
    ...
}

use SQL::Yapp
    check_identifier => \&my_check_identifier,
    ...;

It is important to set the check_identifier function as early as in the use statement, because it is invoked at compile-time. Setting it afterwards is possible, but only allows run-time checks (which must be explicitly enabled, see below). You can set the function later by invoking:

SQL::Yapp::check_identifier { ... };

For columns, the function will be invoked with five parameters, namely:

$check_identifier->('Column', $catalog, $schema, $table, $column)

The $catalog and $schema will be undef if unspecified. For unqualified columns (i.e., without explicit table name), the $table parameter will be either undef, if no possible table is known, or $table will be an array reference with all tables that might contain the column.

If the column is *, this function will not be invoked.

For identifiers other than columns, the functions will be invoked with only four parameters, the first being the kind of identifier (in the same syntax as the name after the sql...{...}, e.g., Table, Index, Constraint, CharSet, etc.) followed by the schema-qualified identifier, again using undef for unqualified parts:

$check_identifier->($kind, $catalog, $schema, $identifier);

For example, for a table:

$check_identifier->('Table', $catalog, $schema, $table_name);

By default, only compile-time checks are performed. You can request run-time checks, too, so that all identifiers are checked, including those interpolated from Perl code, which is not seen at compile-time, of course. Run-time checks are enabled by setting the runtime_check flag to one, either early:

use SQL::Yapp
    runtime_check => 1,
    ...;

or later:

SQL::Yapp::runtime_check(1);

Both involved functions can also be imported:

use SQL::Yapp
    ...,
    qw(check_identifier runtime_check ...);

Note that the package does not (yet) understand the SQL syntax good enough to infer possible tables for columns, so we never pass an array ref. But be prepared for it to avoid being surprised in a later version of the library.

Normalisation

We currently don't do very much to normalise the SQL syntax so that it works for multiple data bases no matter how you write your query. What we do is listed in this section.

LIMIT Normalisation

Since LIMIT clauses are non-standard, they are normalised as described in "LIMIT Interpolation".

DELETE Normalisation

MySQL allows you to specify DELETE statements with a different syntax, listing some tables before FROM and some after it. This syntax is rejected. You are forced to write this with a USING clause. This is normalisation by forcing good upon the user.

Unsupported MySQL Extension (from the MySQL documentation):

DELETE t1, t2 FROM t1 INNER JOIN t2 INNER JOIN t3
WHERE t1.id=t2.id AND t2.id=t3.id;

The following is the supported equivalent. Also note the use of CROSS JOIN instead, as INNER JOIN requires an ON clause. Plus, you need to use parentheses:

DELETE FROM t1, t2 USING t1 CROSS JOIN t2 CROSS JOIN t3
WHERE (t1.id=t2.id) AND (t2.id=t3.id);
CASE Normalisation

A CASE expression with zero WHEN clauses will be normalised to its default value.

Also, the default value will always be printed (if missing, ELSE NULL will be generated).

Example:

my @e= sqlExpr{
    CASE a WHEN 1 THEN 0 ELSE 5 END,
    CASE a WHEN 1 THEN 0 END,
    CASE a ELSE 5 END,
    CASE a END
};

Some of these are syntax errors in plain SQL, but we accept them, and generate the following code, resp.:

CASE `a` WHEN 1 THEN 0 ELSE 5 END,
CASE `a` WHEN 1 THEN 0 ELSE NULL END,
5
NULL
INSERT ... SET Normalisation

MySQL has an extension in the INSERT statement that allows the use of SET instead of VALUES. I personally find this much more natural and more easy to read and maintain than the normal syntax where column names are separated from their respective values.

For this reason, this syntax is allowed although there is a portable alternative. If you pass a single hash table to the SET clause, it will be normalised to the standard form, e.g.:

my %a= ( a => 5, b => 6 );
my $q= sql{
    INSERT INTO t SET %a
};

will be normalised to (the column order may be different, depending on Perl's mood of enumerating the hash table):

INSERT INTO `t` (`a`,`b`) VALUES (5,6);

You can do all kinds of fancy things and the transformation will still work:

my @q= sql{
    INSERT INTO t SET a = 5, b = 6 ;
    INSERT INTO t SET %{{ a => 5, b => 6 }} ;
    INSERT INTO t SET %a, c = 7
}

Even more fancy things work:

my $cola=  sqlColumn{ a };
my $colc=  sqlColumn{ c };
my $exprb= sqlExpr{ b = 6 };
my $exprc= sqlExpr{ $colc = 7 };
my $q= sql{
    INSERT INTO t SET $cola = 5, $exprb, $exprc;
}

In short, you can freely use INSERT ... SET even for data base servers that only support INSERT ... VALUES.

Operator/Function Normalisation
  • POW() and ** will be normalised to POWER.

  • || will be normalised to CONCAT if write_dialect == 'mysql'. And vice versa: CONCAT will be translated to || in any dialect but mysql.

  • MySQL and has an extension to parse &, |, and ^ as bit operations like C. In Oracle, on the other hand, there is BITAND, but with a slightly different semantics. To ease porting, the C operators are converted to BITAND, BITOR, BITXOR for Oracle, and vice versa for MySQL. You need a bit of more work (function definitions) for this to work in Oracle, however, but it is a start.

Manual Parsing

Sometimes you may want to parse SQL structures at run time, not at compile time. There is a function parse to do this. Its invocation is straight-forward:

my $perl= SQL::Yapp::parse('ColumnSpec', 'VARCHAR(50) NOT NULL');

The result of this function is a string with Perl code (this is what the compiler needs, so this is what you get here). To create an object, you need to evaluate this:

my $obj= eval($perl);

This $obj behaves exactly like a structure created at compile time, e.g. the following creates the same object:

my $obj2= sqlColumnSpec{VARCHAR(50) NOT NULL};

Neither for parse() nor for eval() you will need the DBI link (the dbh module option). Only if you stringify the object, you will need it.

List of SQL Structures

sqlExpr{...}

An expression. You can use it wherever expressions are used in SQL:

my $test= sqlExpr{a == 5};
sql{... WHERE $test ...}

See also "<Expr>" and "Expression Interpolation".

sqlColumn{...}

A column specification. This may be a complex column name containing a table name.

See also "<Column>" and "Column Interpolation".

sqlTable{...}

A table specification

See also "<Table>" and "Table Interpolation".

sqlCharSet{...}

A character set.

See also "<CharSet>".

sqlCollate{...}

A collation.

See also "<Collate>".

sqlIndex{...}

An index name.

See also "<Index>".

sqlTableOption{...}

An option for CREATE TABLE.

See also "<TableOption>".

sqlTransliteration{...}

A transliteration name.

See also "<Transliteration>".

sqlTranscoding{...}

A transcoding name.

See also "<Transcoding>".

sqlStmt{...}

A complete SQL statement.

See also "<Stmt>" and "Statement Interpolation".

sqlType{...}

A type (for CREATE ... or ALTER ...).

See also "<Type>" and "Type Interpolation".

sqlJoin{...}

A join clause to be used in SELECT statements and many others.

See also "<Join>" and "Join Interpolation".

sqlOrder{...}

An order specification for GROUP BY and ORDER BY clauses. Essentially an expression whose Perl string interpolations default to column names and which are optionally suffixed with ASC or DESC.

See also "<Order>" and " ORDER BY Interpolation"" in "GROUP BY .

SYNTAX

In the following sections, the supported syntax is listed in detail. A BNF variant is used to represent the syntax, and most people will probably find it intuitiv without further explanation. Still, here are some explanations.

The ::= operator is left out. Instead, the previous headline defines what is currently defined.

Upper case identifiers are literal keyword terminals:

SELECT

CamelCase identifiers in quotes are literal typecast keyword terminals:

'Join'

Symbols in quotes are literal symbolic terminals:

'('

CamelCase identifiers (and maybe a little more) in pointed brackets are non-terminals and refer to other rules:

<Join>
<SELECT Stmt>

Plain English text in pointed brackets is a terminal that is informally explained by that text:

<a number in Perl syntax>

Optional parts:

[ ... ]

Optional parts that, depending on other syntax elements or other constraints, may even be forbidden, and need further clarification to fully describe the syntax:

[ ... ]?

Alternatives:

A | B | C

Grouping, for example together with a list of alternatives:

A ( B | C )

Literal parenthesis also form groups:

'(' ... ')'

Sequences that contain one item A or more:

A ...

Sequences with comma separator that contain once item A or more times:

A , ...

Note that this grammar allows redundant commas or other separators in all lists except after the last element. Lists delimited with parentheses even allow redundant commas after the last element before the closing parenthesis. Such lists are rectified and printed in proper SQL syntax. This is such a common typo, especially in CREATE TABLE statements, that it was felt it should be tolerated.

<SELECT Stmt>

SELECT
    [ ALL | DISTINCT | DISTINCTROW ]
    [ <MyPreOption> , ... ]
    ( <ExprAs> , ... )
    [ FROM ( <Table> , ... )
        [ <Join> ... ]
        [ WHERE Expr ]
        [ GROUP BY ( <Order> , ... ) [ WITH ROLLUP ] ]
        [ HAVING Expr ]
        [ ORDER BY ( <Order> , ... ) ]
        [ LIMIT ( <Count> | ALL | <Offset> , <Count> ) ]
        [ OFFSET <Offset> ]?
        [ FOR ( UPDATE | SHARE ) ]
        [ ( <PostgresPostOption> | <MyPostOption> ) , ... ]
    ]
<Offset>
<Count>
<Integer> | '?'

See also "Limit Interpolation".

<ExprAs>
<Expr> [ AS <ColumnName> ]
<MyPreOption>
  HIGH_PRIORITY | STRAIGHT_JOIN
| SQL_SMALL_RESULT | SQL_BIG_RESULT | SQL_BUFFER_RESULT
| SQL_CACHE | SQL_NO_CACHE | SQL_CALC_FOUND_ROWS
<MyPostOption>
LOCK IN SHARE MODE
<PostgresPostOption>
NOWAIT

The OFFSET clause is forbidden if Offset was parsed before in the LIMIT clause.

See also "<ColumnName>", "<Expr>", "<Integer>", "<Join>", "<Order>", "LIMIT Interpolation".

<INSERT Stmt>

INSERT [ ( LOW_PRIORITY | HIGH_PRIORITY | DELAYED | IGNORE ) , ... ]
    [ INTO ] <Table> [ '(' <ColumnName> , ... ')' ]
    (
      DEFAULT VALUES
    | ( VALUES | VALUE ) ( <ExprList> , ... )
    | SET ( <ColumnName> '=' <Expr> , ... )
    | <SELECT Stmt>
    )
    [ ON DUPLICATE KEY UPDATE ( <ColumnName> '=' <Expr> , ... ) ]

This is a blend of MySQL and PostgreSQL syntax in order to support both syntaxes. LOW_PRIORITY, HIGH_PRIORITY, DELAYED, IGNORE, SET and ON DUPLICATE KEY UPDATE are MySQL only.

See also "<ColumnName>", "<Expr>", "<ExprList>", "<SELECT Stmt>", "<Table>", "INSERT ... SET Normalisation", "Expression List Interpolation".

<UPDATE Stmt>

UPDATE
    [ ( LOW_PRIORITY | IGNORE | ONLY ) , ... ]
    ( <TableAs> , ... )
    SET ( <Column> '=' <Expr> , ... )
    [ FROM ( <Table> , ... ) ]
    [ WHERE <Expr> ]
    [ ORDER BY <Order> ]
    [ LIMIT <Count> ]

This is a blend of MySQL and PostgreSQL syntaxes in order to support both. MySQL allows multiple tables to be updated in one statement and does not support the FROM clause, while PostgreSQL allows only one table but supports FROM. No normalisation is provided (any attempt would be messy), so you must use the appropriate syntax for your DB.

ONLY is for PostgreSQL only, while IGNORE, LIMIT, and ORDER BY are MySQL.

See also "<Column>", "<Count>", "<Expr>", "<Order>", "<TableAs>".

<DELETE Stmt>

DELETE [ IGNORE ]
    FROM [ ONLY ] ( <Table> , ... )
    [ USING ( <Table> , ... ) ]
    [ WHERE <Expr> ]
    [ ORDER BY <Order> ]
    [ LIMIT <Count> ]

This is a blend of MySQL and PostgreSQL syntaxes in order to support both syntaxes. ONLY is for PostgreSQL only, while IGNORE, LIMIT, and ORDER BY are MySQL.

See also "<Count>", "<Expr>", "<Order>", "<Table>".

<CREATE TABLE Stmt>

CREATE [ LOCAL | GLOBAL ] [ TEMPORARY ] TABLE <Table>
[
  '(' ( <ColumnName> <ColumnSpec> | <TableConstraint> ) , ... ')'
]
[ <TableOption> ... ]
[ AS <SELECT Stmt> ]

See also "<ColumnSpec>", "<Table>", "<TableConstraint>", "<TableOption>".

<ColumnSpec>

<Type> [ <ColumnAttr> ... ]
<ColumnAttr>
  <TypeAttr>
| <ColumnConstraint>
<ColumnConstraint>
[ CONSTRAINT <Constraint> ]
(
    <References>
|   NULL               | NOT NULL
|   PRIMARY KEY        | DROP PRIMARY KEY
|   KEY                | DROP KEY
|   UNIQUE             | DROP UNIQUE
|   AUTO_INCREMENT     | DROP AUTO_INCREMENT
|   DEFAULT <Expr>     | DROP DEFAULT
|   CHECK '(' Expr ')' | DROP CHECK
|   COMMENT <Expr>     | DROP COMMENT
|   COLUMN_FORMAT ( FIXED | DYNAMIC | DEFAULT )
|   STORAGE       ( DISK  | MEMORY  | DEFAULT )
)

The non-standard negative forms DROP ... can be used in modification of existing ColumnSpec objects to remove the corresponding constraint.

The constraints AUTO_INCREMENT, COMMENT, COLUMN_FORMAT, and STORAGE are MySQL extensions.

<References>
REFERENCES <Table> '(' <ColumnName> , ... ')'
[ MATCH ( SIMPLE | PARTIAL | FULL ) ]
[ ON DELETE <OnAction> ]
[ ON UPDATE <OnAction> ]
<OnAction>
RESTRICT | CASCADE | SET NULL | SET DEFAULT | NO ACTION

See also "<Type>", "<TypeAttr>".

<TableConstraint>

[ CONSTRAINT <Constraint> ]
(
    PRIMARY KEY '(' <ColumnIndex> , ... ')' [ <IndexOption> ... ]
|   UNIQUE      '(' <ColumnIndex> , ... ')' [ <IndexOption> ... ]
|   FULLTEXT    '(' <ColumnIndex> , ... ')' [ <IndexOption> ... ]
|   SPATIAL     '(' <ColumnIndex> , ... ')' [ <IndexOption> ... ]
|   INDEX       '(' <ColumnIndex> , ... ')' [ <IndexOption> ... ]
|   FOREIGN KEY '(' <ColumnName>  , ... ')' <References>
|   CHECK       '(' Expr ')'
)

<ColumnIndex>

<ColumnName> [ '(' <Count> ')' ] [ ASC | DESC ]

<IndexOption>

USING ( BTREE | HASH | RTREE )

<TableOption>

  ENGINE                <Engine>
| DEFAULT CHARACTER SET <CharSet>
| DEFAULT COLLATE       <Collate>
| AUTO_INCREMENT        <Expr>
| COMMENT               <Expr>
| ON COMMIT ( PRESERVE ROWS | DELETE ROWS | DROP )
| [ 'TableOption' ] <Perl>

<ALTER TABLE Stmt>

ALTER [ ONLINE | OFFLINE ] [IGNORE] TABLE [ONLY] <Table>
( <AlterTableOption> , ... )

See also "<Table>".

<AlterTableOption>
  RENAME TO         <Table>
| ADD    <TableConstraint>
| ADD    COLUMN     '(' <ColumnName> <ColumnSpec> , ... ')'
| ADD    COLUMN     <ColumnName>              <ColumnSpec> [ <ColumnPos> ]
| MODIFY COLUMN     <ColumnName>              <ColumnSpec> [ <ColumnPos> ]
| CHANGE COLUMN     <ColumnName> <ColumnName> <ColumnSpec> [ <ColumnPos> ]
| ALTER  COLUMN     <ColumnName> <AlterColumn>
| DROP   COLUMN     <ColumnName> [ RESTRICT | CASCADE ]
| RENAME COLUMN     <ColumnName> TO <ColumnName>
| DROP   CONSTRAINT <Constraint> [ RESTRICT | CASCADE ]
| DROP   PRIMARY KEY

MODIFY and CHANGE are MySQL extensions, which does not know about some of the ALTER COLUMN stuff, which PostgreSQL uses. It is almost impossible to specify something useful that's understood by both DB systems, it seems.

Other MySQL extensions: <ColumnPos>, multi-column syntax.

Other PostgreSQL extensions: TYPE and a few other things this module does not yet support.

<ColumnPos>
FIRST | AFTER <ColumnName>
<AlterColumn>
  SET DEFAULT <Expr> | DROP DEFAULT
| TYPE <Type> [ USING <Expr> ]
| SET NOT NULL | DROP NOT NULL

<DROP TABLE Stmt>

DROP [ TEMPORARY ] TABLE ( <Table> , ... ) [ RESTRICT | CASCADE ]

See also "<Table>".

<Stmt>

The alternatives are described under in the following sections:

"<SELECT Stmt>"
"<INSERT Stmt>"
"<UPDATE Stmt>"
"<DELETE Stmt>"
"<CREATE TABLE Stmt>"
"<ALTOR TABLE Stmt>"
"<DROP TABLE Stmt>"

See also "Statement Interpolation".

<Join>

CROSS                   JOIN ( <TableAs> , ... )
UNION                   JOIN ( <TableAs> , ... )

NATURAL       [ INNER ] JOIN ( <TableAs> , ... )
NATURAL LEFT  [ OUTER ] JOIN ( <TableAs> , ... )
NATURAL RIGHT [ OUTER ] JOIN ( <TableAs> , ... )
NATURAL FULL  [ OUTER ] JOIN ( <TableAs> , ... )

              [ INNER ] JOIN ( <TableAs> , ... ) <LinkCond>
        LEFT  [ OUTER ] JOIN ( <TableAs> , ... ) <LinkCond>
        RIGHT [ OUTER ] JOIN ( <TableAs> , ... ) <LinkCond>
        FULL  [ OUTER ] JOIN ( <TableAs> , ... ) <LinkCond>

                       'Join' <Perl>
<LinkCond>
  ON <Expr>
| USING '(' <Column> , ... ')'
<TableAs>
<Table> [ AS <TableName> ]

See also "<Column>", "<Expr>", "<Perl>", "<Table>", "<TableName>", "Join Interpolation".

Note that <Join> is always parsed in list context.

When producing SQL, this package always qualifies a JOIN with exactly one of CROSS, UNION, INNER, LEFT, RIGHT, and FULL, with the exception than instead of NATURAL INNER JOIN, the specification NATURAL JOIN is printed for further compatibility (e.g. with MySQL).

Note that MySQL, INNER JOIN and CROSS JOIN are not distinguished, but in standard SQL, they are. This module forces you to write more portable SQL: use INNER JOIN if there is an ON clause, and CROSS JOIN if not.

<Order>

( <Column>
| <Expr>
| [ 'Order' ] <Perl> )
[ ASC | DESC ]

Note that a string returned from a Perl interpolation is parsed as a column name in <Order> position, but a plain string is parsed as a plain string.

See also "<Column>", "<Expr>", "<Perl>", " ORDER BY Interpolation"" in "GROUP BY .

<Keyword>

<a sequence of upper case characters, numbers, underscores>

Whether a keyword is a reserved word or not can be looked up on the SQL standard. Basically, if it marks up statements or special syntax like infix operators, it is reserved, e.g. SELECT or ESCAPE.

<Identifier>

  <a sequence of lower case characters, numbers, underscores>
| '`' <a sequence of any characters except newline and `> '`'
| <Perl>

See also "<Perl>", "Identifier Name Translation".

<TableName>

  <Identifier>
| <Perl>

See also "<Identifier>", "<Perl>", "Identifier Name Translation".

<Table>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <TableName>
| [ 'Table' ] <Perl>

Table specifications are constructed by maximally three components, the last of which is the table name, the last-but-first is the schema, the last-but-second is the catalog. All but the table name are optional.

See also "<Identifier>", "<Perl>", "<TableName>", "Table Interpolation", "Identifier Name Translation".

<ColumnName>

  <Identifier>
| <Perl>

See also "<Identifier>", "<Perl>", "Identifier Name Translation".

<Column>

  [ <Table> '.' ] ( <ColumnName> | '*' )
| [ 'Column' ] <Perl>

See also "<Identifier>", "<Perl>", "<Table>", "Column Interpolation", "Identifier Name Translation".

<Perl>

For the exact syntax, check your Perl manual, but to get the idea, the following items are examples for valid Perl interpolations:

<Integer>
<String>
<Variable>
{ ...PerlCode... }
<Integer>
99
0xff
077
0b111
...
<String>
'string'
"string"
"string with $var"
"string with \n escape"
...
<Variable>
$var
$var[1]
$var{1}
$var->{boo}
$var->{boo}[0]('test')
@var
@var[1..2]
%var
...

Text::Balanced::extract_variable() is used for extraction of the sigil tokens, Text::Balanced::extract_delimited() is used for extracting the strings, and Text::Balanced::extract_codeblock() is used for extracting Perl code enclosed in braces.

The package distinguishes the different interpolation forms in context, e.g. might handle hashes differently from arrays, or produce error message for inappropriate literals but not for others. For this reason, Perl casts are handled as well:

${ ...PerlCode... }
@{ ...PerlCode... }
%{ ...PerlCode... }

Decimal numbers are not modified but parsed as strings so that arbitrarily large numbers are supported as literals.

See also "Perl Interpolation".

<Expr>

  '?'
| NULL
| TRUE
| FALSE
| UNKNOWN
| DEFAULT
| <Column>
| [ 'Expr' ] <Perl>
| '(' <Expr> ')'
| <SubQuery>
| <Functor> '(' [ <Expr> , ... ] ')'
| <ExprSpecialFunc>
| <Prefix> <Expr>
| <Expr> <Suffix>
| <Expr> <SuffixSpecial>
| <Expr> <Infix> <Expr>
| <ExprCase>
<Functor>
<an unknown, non-reserved <Keyword> >

See below for more information.

See also "<Column>", "<ExprCase>", "<ExprSpecialFunc>", "<Infix>", "<Keyword>", "<Perl>", "<Prefix>", "<SubQuery>", "<Suffix>", "<SuffixSpecial>", "Expression Interpolation".

As mentioned already, literal constant values have Perl syntax, and Perl interpolations are allowed at any place.

SQL has a few special literals that are always recognised although they may be semantically or even syntactically misplaced. SQL will tell you, this package does not check this.

In order to support all the functions and operators of any SQL dialect that might be used, expression syntax in general does not follow Perl, but SQL syntax. Otherwise, it would be necessary to translate Perl to the SQL dialect in use, but this package is not mainly meant to normalise SQL, but to embed whatever dialect you are using into Perl, making injections impossible, and thus making SQL usage safe. See "Normalisation".

So this package tries to parse SQL expressions with as little knowledge as possible. This means sacrificing early error detection, of course: many syntax errors in expressions will only be found by the SQL server. We only parse as much as to ensure easy and safe Perl code interpolation.

However, this package no precedence to any of the operators, meaning you have to use parenthesis! This was done for two reasons: (1) to find bugs, (2) to handle = uniformly in UPDATE...SET and SELECT statements: in the former, = has very low precedence and is an assignment operator, while in the latter = has medium priority and is an equality operator. We would like to handle the two uniformly so that you can write:

my $x= sqlExpr{ a = 5 };
my $q= sql{
    SELECT $x
};
my $q2= sql{
    UPDATE tab1 SET $x
};

This is especially interesting for handling hash interpolation uniformly in these two cases. It was felt that the exact precedence order of SQL is a mystery to many Perl programmers anyway (as is the precedence of the operators in Perl itself :-P), so using parens wasn't felt too high a price to pay. (There's a hack to enable some precedence parsing for the most common operators, but that's kept a secret until enough people complain.)

Known known associative and commutative operators may be used in sequence without parenthesis.

1  +  2  +  3      # OK: associative and commutative
1  -  2  -  3      # ERROR: not associative
1  +  2  -  3      # ERROR: mixing is never allowed
1 AND 2 AND 3      # OK: associative and commutative
1 AND 2 OR  3      # ERROR: mixing
1  <  2  <  3      # ERROR: not associative

To make life easier for Perl programmers, the == and != operators are recognised as aliasses for = and <>, resp. There are no aliasses for && and ||, because || has a special meaning in standard SQL, namely string concatenation.

Any unrecognised keywords and symbols have a default behaviour when parsing embedded SQL: they are functors:

CONCAT(A,B,@C,$D)

See also "Expression Interpolation"

Missing: PostgreSQL ROW(<Expr> , ...) or simply (<Exp> , ...) values. The former works, because ROW is an unknown keyword and thus is treated like a function call, which actually produces the right result here, although it's a constructor term.

Missing: <Column>.<Field> This is tricky because the package doesn't really understand identifier chains, so it treats the last component as a column name, the second-to-last as a table name, etc. Specifying a field name will disturb and counting and the result will be wrong. (Often, you won't notice, but it will be wrong regardless.)

<Prefix>

  +
| -
| NOT
| ANY | SOME | ALL

ANY, SOME, and ALL must follow a comparison operator and must precede a subquery.

See also "Interpolation of Unary Operators".

<Infix>

  '+' | '-'
| '*' | '/'
| AND | OR | XOR
| '=' | '<>' | '<=' | '>=' | '<' | '>' | '==' | '!='
| OVERLAPS
| IS DISTINCT FROM
| '||'
| '^' | '**'

== and != are translated to = and <>, resp.

The following are extensions of MySQL: ^, XOR.

The following are extensions of Oracle: **.

See also "Array Interpolation in Expressions".

<Suffix>

  IS [NOT] [ NULL | TRUE | FALSE | UNKNOWN | NORMALIZED ]
| IS [NOT] A SET

You can use NORMALISED as an alias for NORMALIZED. It will be normalised. :-)

See also "Interpolation of Unary Operators".

<SuffixSpecial>

  [NOT] BETWEEN [ SYMMETRIC | ASYMMETRIC ] <Expr> AND <Expr>
| [NOT] [ LIKE | CLIKE | SIMILAR TO ] <Expr> [ ESCAPE <Expr> ]
| [NOT] IN <ExprList>
| [NOT] IN <SubQuery>
| IS [NOT] OF '(' <Type> , ... ')'
| IS [NOT] A SET

See also "<Expr>", "<ExprList>", "<SubQuery>", "<Type>".

<ExprSpecialFunc>

  CAST        '(' <Expr> AS <Type> ')'
| TREAT       '(' <Expr> AS <Type> ')'
| TRANSLATE   '(' <Exrp> AS <Transliteration> ')'
| POSITION    '(' <Expr> IN <Expr> [ USING <CharUnit> ] ')'
| SUBSTRING   '(' <Expr> FROM <Expr> [ FOR <Expr> ] [ USING <CharUnit> ] ')'
| CHAR_LENGTH '(' <Expr> [ USING <CharUnit> ] ')'
| OVERLAY     '(' <Expr> PLACING <Expr> FROM <Expr> [ FOR <Expr> |
                  [ USING <CharUnit> ] ')'
| CONVERT     '(' <Expr> USING <Transcoding> ')'
| EXTRACT     '(' <Expr> FROM <Expr> ')'
| UNNEST      '(' <Expr> ')' [ WITH ORDINALITY ]

See also "<CharUnit>", "<Expr>", "<Transcoding>", "<Transliteration>", "<Type>".

Instead of CHAR_LENGTH, you can also use CHARACTER_LENGTH.

<ExprCase>

CASE
  (
    [ <WhenExpr> , ... ]
  | <Expr> [ <WhenCase> , ... ]
  )
  [ ELSE <Expr> ]
  END

Note that in contrast to standard SQL, zero WHEN...THEN... pairs are excepted. If there are indeed zero, the whole CASE...END block is reduced to the ELSE expression.

See also "<Expr>", "<WhenCase>", "<WhenExpr>".

<WhenExpr>

WHEN <Expr> THEN <Expr>

See also "<Expr>".

<WhenCase>

  <WhenExpr>
| WHEN <Suffix> THEN <Expr>
| WHEN <SuffixSpecial> THEN <Expr>
| WHEN <Infix> <Expr> THEN <Expr>

Only those <Suffix>, <SuffixSpecial>, <Infix> are allowed here that return boolean results.

See also "<Expr>", "<Infix>", "<Suffix>", "<SuffixSpecial>".

<ExprList>

  '(' <Expr> , ... ')'
| <Perl>

See also "<Expr>", "<Perl>", "Expression List Interpolation".

<SubQuery>

  '(' <SELECT Stmt> ')'
| '(' [ 'Stmt' ] <Perl> ')'

See also "<SELECT Stmt>", "<Perl>".

<Type>

  <BaseType>        [ <TypeAttr> ... ]
| [ 'Type' ] <Perl> [ <TypeAttr> ... ]
| <Type> ARRAY    [ '[' <Integer> ']' ]
| <Type> MULTISET
<TypeAttr>
  <BaseType>
| <Precision>
| <IntAttr>
| <LargeLength>
| <CharSetAttr>
| <CollateAttr>
| <WithTimeZone>
| <OtherType>
| [ 'Type' ] <Perl>

So in contrast to SQL, attributes and even base types can be mixed and given in any order after an initial base type or Perl interpolation. The order will be normalised when printing SQL, of course. The reason why this change was made is that it allows the modification of types stored in Perl. See "Type Interpolation".

Not all combination of type attributes are accepted. What's accepted depends on the read-dialect used. Check your SQL manual for details.

<BaseType>
  INT | BIGINT | SMALLINT | TINYINT | MEDIUMINT | NUMERIC
| DECIMAL | NUMBER | FLOAT | REAL | DOUBLE PRECISION
| CHAR | VARCHAR | CLOB | TEXT | TINYTEXT | MEDIUMTEXT | LONGTEXT
| NCHAR | NCHAR VARYING | NCLOB
| BIT | BIT VARYING
| BYTE | BINARY | VARBINARY | BLOB | TINYBLOB | MEDIUMBLOB | LONGBLOB
| DATE | DATETIME | YEAR | TIME | TIMESTAMP | MONEY | BYTEA | UUID
| POINT | LINE | LSEG | BOX | PATH | POLYGON | CIRCLE |
| IRDR | INET | MACADDR
| ENUM '(' <Expr> , ... ')'
| SET  '(' <Expr> , ... ')'

Many of these types depend on the selected read-dialect and are not standard SQL.

<Precision>
'(' <Integer> [ ',' <Integer> ] ')'
<LargeLength>
'(' <Integer> [ <IntMul> ] [ <CharUnit> ] ')
<IntMul>
K | M | G
<IntAttr>
  SIGNED | UNSIGNED | DROP SIGN
| ZEROFILL          | DROP ZEROFILL

The DROP ... extensions can be used to remove the corresponding type attribute completely when modifying types.

<WithTimeZone>
  WITH [LOCAL] TIME ZONE
| WITHOUT TIME ZONE
| DROP TIME ZONE

DROP TIME ZONE is an extension so you can remove the type attribute completely when modifying types.

<CharSetAttr>
  CHARACTER SET <CharSet>
| DROP CHARACTER SET

DROP CHARACTER SET is an extension so you can remove the type attribute completely when modifying types.

Aliases:

  ASCII   = CHARACTER SET latin1
| UNICODE = CHARACTER SET ucs2
<CollateAttr>
  COLLATE <Collate>
| NO COLLATE

NO COLLATE is an extension so you can remove the type attribute completely when modifying types.

Aliases (will be normalised to main form):

DEC                 = DECIMAL
FIXED               = DECIMAL
INTEGER             = INT
CHARACTER           = CHAR
CHAR VARYING        = VARCHAR
CHAR LARGE OBJECT   = CLOB
NATIONAL CHAR       = NCHAR
VARNCHAR            = NCHAR VARYING
CHAR BYTE           = BINARY
BINARY VARYING      = VARBINARY
NCHAR LARGE OBJECT  = NCLOB
BINARY LARGE OBJECT = BLOB
BOOL                = BOOLEAN
VARBIT              = BIT VARYING
[]                  = ARRAY
[ <Integer> ]       = ARRAY [ <Integer> ]

The following are extensions and not standard SQL:

MySQL extensions: TINYINT, MEDIUMINT, *INT types with <Length>, UNSIGNED, SIGNED, ZEROFILL, BIT, BINARY, VARBINARY, TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT, TINYBLOB, MEDIUMBLOB, LONGBLOB, ENUM, SET

Postgres extensions: MONEY, BYTEA, UUID, IRDR, INET, MACADDR, POINT, LINE, LSEG, BOX, PATH, POLYGON, CIRCLE

Missing PostgreSQL: INTERVAL, XML, ENUM (CREATE TYPE).

<ColumnSpec>

  <Type>
| [ 'ColumnSpec' ] <Perl>
| <ColumnSpec> ( <ColumnSpecAttr> ... )
<ColumnSpecAttr>
  NULL | NOT NULL
| DEFAULT <Expr>
| AUTO_INCREMENT | NO AUTO_INCREMENT
| UNIQUE [KEY] | [ NOT UNIQUE | NO UNIQUE KEY ]
| PRIMARY | NOT PRIMARY
| KEY | NO KEY
| COMMENT <Expr>
| COLUMN_FORMAT [ FIXED | DYNAMIC | DEFAULT ]
| STORAGE [ DISK | MEMORY | DEFAULT ]

Many of these are MySQL extensions.

<CharUnit>

  CHARACTERS
| CODE_UNITS
| OCTETS

<Engine>

<Identifier>

<CharSet>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'CharSet' ] <Perl>

<Collate>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'Collate' ] <Perl>

<Index>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'Index' ] <Perl>

<Constraint>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'Constraint' ] <Perl>

<Transliteration>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'Transliteration' ] <Perl>

<Transcoding>

  [ [ <Identifier> '.' ] <Identifier> '.' ] <Identifier>
| [ 'Transcoding' ] <Perl>

IMPLEMENTATION

This module is implemented with the Filter::Simple package.

The lexer uses the Text::Balanced package for extracting embedded Perl.

The parser for SQL statements is a hand-written recursive descent parser. The lexer shifts pos() along as it scans the text, so the interface is well-suited for Text::Balanced. The author particularly likes pattern matching with m/\G.../gc.

SEE ALSO

This module uses or is related to Filter::Simple, Text::Balanced, and DBI.

There is a similar, smaller module that also uses source filtering: SQL::Interpolate::Filter. Similar to this module, it is activated by quotelike syntax, and it also uses the 'sql' prefix. (This is a coincidence, but then, what other prefix would you naturally use?) It is somewhat different from our approach, as it replaces variable names with bind places and does not really parse the complete SQL syntax, so it only handles values, not, say, column names, join clauses, etc.. It also only parses complete statements -- the parser cannot be requested to parse a single expression only.

A similar idea but without source filtering can be found in SQL::Interpolate (with the older name SQL::Interp).

Another module that uses source filtering is SQL::PreProc. It uses a different approach and allows using SQL statements directly in source code, without special markup around it. This leads to a different programming paradigm and looks very different from plain DBI usage.

Another source filter is Filter::SQL, which, like SQL::PreProc embeds statements directly into Perl without special markup.

Different approaches for making SQL usage safe are found in SQL::Abstract and SQL::DB, which provide SQL queries with a Perl-style interface, so the queries don't look like SQL anymore, but are also safe to use without possibility of SQL injections.

BUGS

Source filters are usually fragile, meaning that you can write Perl code that breaks the filter. To do it properly, it would be necessary to plug into the Perl parser itself. For example, currently, ${sql{a}} (for $sql{a}) will trigger filtering. (And even $sql{a} needed a hack to make it work.)

Moreover, unfortunately, it was unfeasible to use Filter::Simple in code_no_comments mode, because that filter is way to slow. This means that sql{...} is also considered inside comments and strings.

The supported syntax is currently mainly based on MySQL, and while I also looked at PostgreSQL and the SQL-2003 specs sometimes, I am pretty sure that a lot of useful stuff is missing for many DBs. Please don't hesitate to tell me what you're using so I can add it.

Finally, the resulting Perl code could be optimised more. This is on my TODO list.

There are some pretty bad problems when sql{...} is used inside comments. This may lead to syntax errors, because line breaks might be introduced. A more understanding parser I experimented with was too slow to be used in practice. This needs more work.

Missing Syntax

UNION is missing.

WITH ... SELECT is not yet implemented.

The MySQL REPLACE command is currently not supported; it is an extension. You can use DELETE+INSERT instead, which is more portable anyway. The REPLACE command will nevertheless be added later.

Several other SQL commands are also missing.

AUTHOR

Henrik Theiling <cpan@theiling.de>

COPYRIGHT AND LICENSE

Copyright 2010 by Henrik Theiling <cpan@theiling.de>

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.