NAME

DBIx::MyParse - Perl API for MySQL's SQL Parser

SYNOPSIS

use DBIx::MyParse;
my $parser = DBIx::MyParse->new(
	database => 'database',
	groups => ['my_cnf_group'],
	options => ['--skip-networking'],
	datadir => '/tmp'
);
my $query = $parser->parse("SELECT field FROM table");
print $query->getCommand();

DESCRIPTION

This module provides access to MySQL's SQL parser, which is a full-featured lexx/yacc-based SQL parser, complete with subqueries and various MySQL extensions.

Please check the documentation for DBIx::MyParse::Query to see how you can access the parse tree produced by parse(). The parse tree itself consists of DBIx::MyParse::Item objects.

INSTALLATION

A binary RPM created using cpan2rpm on a Fedora Core 6 is available from http://www.sf.net/projects/myparse. Alternatively, please see the README for details on compiling the module from scratch. You will need to patch and compile the MySQL source.

CONSTRUCTOR

The constructor allows one to specify what options to be passed to libmysqld. Please make sure your options are all syntactically correct. An incorrect option can cause the constructor to exit() in a silent and untrappable way. If no options are specified, the following defaults are used, which provide low memory usage:

--skip-grant-tables --skip-networking --read_only
--key_buffer_size=1K --key_buffer=1K --sort_buffer_size=1K --myisam_sort_buffer_size=1K
plus --skip for all database engines except MyISAM which can not be --skip-ed

If you specify some options however do not skip loading the database engines, e.g. Innodb, data files may be created and a considerable ammount of memory may be occupied. You can also put your configuration options in /etc/my.cnf, however it is recommended that you use a separate group within that file, rather than the [mysqld] group.

It is also recommended that you specify a database in the constructor or call setDatabase() as soon as possible, because some SQL statements will fail to parse without a default database. At the same time, please note that the default database name may end up in your parse tree as if it was present in the SQL query itself.

It is also recommended that you also specify a datadir because some SQL statements require an existing datadir to parse correctly. Furthermore, within the datadir, you should have a subdirectory for each database you intend to use. There is no need for the directory to contain any .FRM files. If no datadir is specified, /tmp is used. Do NOT specify a datadir that contains useful information -- as this module grows, unforseen interactions can occur.

You can also specify a sql_mode to influence the parsing behavoir. From all possible SQL modes listed in section 5.2.6 of the MySQL manual, only those that pertain to parsing are useful, e.g. "PIPES_AS_CONCAT". Note that specifying non-parsing flags such as "ERROR_FOR_DIVISION_BY_ZERO" will not result in any division by zero errors in your SQL query being caught. Multiple flags are specified with a comma separator.

Even if you do not have the rest of the MySQL disribution installed, you still need to have the /usr/local/share/mysql/charsets directory. You can either run make install in the /sql/share of your MySql source tree or do a complete MySQL install to obtain it.

It is recommended that you only have one DBIx::MyParse object per script. Having several objects is possible, however options and sql_modes specified for the last object created will probably apply to all objects.

OPERATING ENVIRONMENT

At this time, the following options are passed to the MySQL library:

It is recommended that you do a setDatabase() because certain SQL statements will fail to parse if there is no default database. Furthermore, it is best to have a directory with the name of your database under /tmp.

UNSUPPORTED SQL STATEMENTS

A great deal of work has gone into supporting as much of the SQL syntax as possible.

The following SQL statements are not supported:

UNION
ALTER
CREATE except CREATE DATABASE
DROP except DROP DATABASE and DROP TABLE
RENAME DATABASE
HANDLER
LOAD DATA INFILE
HELP

The following esoteric SQL constructs are not currently supported:

* LOAD DATA INFILE
* SELECT INTO OUTFILE
* SELECT PROCEDURE
* CREATE DATABASE with CHARSET or COLLATION

The following SQL functions are not currenly supported:

* VALUES(field)
* ENCODE() and DECODE()
* GET_FORMAT()
* MAKE_SET()
* GROUP_CONCAT
* NAME_CONST()
* MATCH WITH QUERY EXPANSION
* MATCH IN BOOLEAN MODE
* TRIM()

ADVANTAGES OF THE APPROACH

This is a full-featured SQL parser, not a set of regular expressions that parse just the most common queries. It makes use of a complete parsing grammar taken from a real-life database, by virtue of the fact that it uses the MySQL parsing engine to do the dirty work.

This module will accept any input that is a valid MySQL command and will reject any input that is not a valid MySQL command. Accepting an imput is one thing, producing a complete and meaningful parse tree is a different thing, however the module currently produces parse trees of considerable complexity for almost all SQL constructs.

MySQL is unlikely to crash on SQL expressions of any complexity, and so is this parser API. In particular, weird functions, complex nested expressions and operator precedence are all handled correctly by definition. Subqueries and nested joins are also fully supported.

Errors are returned as both error numbers, error codes in English and language-specific long MySQL error messages, rather than as die() or carp().

The module's objects are completely hash-free, which should be considerably faster than a comparable hash-based implementation.

DISADVANTAGES

This module is hooked directly to MySQL's internals. Non-MySQL SQL features are not supported and can not be supported without changing the MySQL source code. Extending MySQL to support new functionality is far more complicated and rewarding than simply adding a few regexps to your home-grown SQL parser.

Some of MySQL's code is not friendly towards being (ab)used in the manner employed by this module. There are object methods declared Private for no obvious reasons.

MySQL is GPL, so this module is GPL, please see the COPYRIGHT section below for more information.

TESTING

Apart from the standard make test test suite, the following approaches were used to test this module:

* The MySQL test suite
* the crash-me script from the MySQL benchmark suite
* DBD::mysql tests

SEE ALSO

Please see the following sources for further information:

MySQL Internals Manual: http://dev.mysql.com/doc/internals/en/index.html

Doxygen documentation for MySQL 4.1 source: http://www.distlab.dk/mysql-4.1/html/

Doxygen documentation for MySQL 5 source: "/leithal.cool-tools.co.uk/sourcedoc/mysql509/html/index.html" in http::

DBIx::MyParse has a page at SourceForge: http://sourceforge.net/projects/myparse/

AUTHOR

Philip Stoev <philip@stoev.org>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Philip Stoev

This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public Licence

Please note that this module links to libmysqld which is distributed under GPL as well. If you intend to use this module in a commercial product, you are strongly advised to contact MySQL directly to obtain a commercial licence for the MySQL embedded server.

Please see the file named LICENCE for the full text of the GNU General Public Licence