NAME
DBIx::AutoUpgrade::NativeStrings - automatically upgrade Perl native strings to utf8 before sending them to the database
SYNOPSIS
use utf8;
use DBI;
use DBIx::AutoUpgrade::NativeStrings;
use Encode;
my $injector = DBIx::AutoUpgrade::NativeStrings->new(native => 'cp1252');
my $dbh = DBI->connect(@dbi_connection_params);
$injector->inject_callbacks($dbh);
# these strings are semantically equal, but have different internal representations
my $str_utf8 = "il était une bergère, elle vendait ses œufs en ¥, ça paie 5¾ ‰ de mieux qu’en €",
my $str_native = decode('cp1252', $str_utf8, Encode::LEAVE_SRC);
# Oracle example : check if strings passed to the database are equal
my $sql = "SELECT CASE WHEN ?=? THEN 'EQ' ELSE 'NE' END FROM DUAL";
my ($result) = $dbh->selectrow_array($sql, {}, $str_native, $str_utf8); # returns 'EQ'
DESCRIPTION
This module intercepts calls to DBI methods for automatically converting Perl native strings to utf8 strings before they go to the DBD driver.
There are two situations where it is useful :
Some DBD drivers do not comply with this DBI specification :
Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.
For example with DBD::Oracle v1.83 and with a client charset set to
AL32UTF8
, native string with characters in the range 128 .. 255 are not converted to utf8 strings; therefore characters in that range become Unicode code points in block C1 control codes, without any graphical display, which is not their intended meaning.Drivers that do attempt to comply with the DBI specification, like for example DBD::SQLite or DBD::Pg, perform an automatic upgrade of native strings ... assuming that the native character set is iso-8859-1 (Latin-1). However some platforms have different native character sets; in particular, the default "codepage" on Windows machines is Windows-1252, where code points in the range 128-159 are mapped to various graphical characters. So if your native strings assume Windows-1252 encoding, such characters will not be stored correctly within the database server.
With the present module, clients explicitly specify at initialization time what is the native encoding. From that, the module automatically converts native strings to their proper Unicode counterpart before sending them to the database.
Of course this only makes sense when the connection to the database is in Unicode mode. Each DBD driver has its own specific way of setting the character set used for the connection; so be sure to properly tune your DBD driver when using the present module.
METHODS
new
my $injector = DBIx::AutoUpgrade::NativeStrings->new(%options);
Constructor for a callback injector object. Options are :
- native
-
The name of the native encoding. This should be either
a valid Perl encoding name, as listed in Encode::Encodings. Strings will be converted through "decode" in Encode;
the string
'locale'
, which will invoke Encode::Locale to automatically guess what is the native encoding;the string
'default'
, which will use the default Perl upgrading mechanism through "utf8::upgrade" in utf8. This is the default value. It works well for latin-1 (iso-8859-1), but not for other native encodings.
- decode_check
-
A bitmask passed as third argument to "decode" in Encode (see "List of CHECK values" in Encode). Default is
undef
. - debug
-
An optional coderef that will be called as
$debug->($message)
. Default isundef
. A simple debug coderef could be :my $injector = DBIx::AutoUpgrade::NativeStrings->new(debug => sub {warn @_, "\n"});
- dbh_methods
-
An optional arrayref containing the list of
$dbh
method names that will receive a callback. The default list is :do prepare selectrow_array selectrow_arrayref selectrow_hashref selectall_arrayref selectall_array selectall_hashref selectcol_arrayref
- sth_methods
-
An optional arrayref containing the list of
$sth
method names that will receive a callback. The default list is :bind_param bind_param_array execute execute_array
- bind_type_is_string
-
An optional coderef that decides what to do with calls to the ternary form of "bind_param" in DBI, i.e.
$sth->bind_param($position, $value, $bind_type);
If
$coderef->($bind_type)
returns true, the$value
is treated as a string and will be upgraded if needed, like arguments to other method calls; if the coderef returns false, the$value
is left intact.The default coderef returns true when the
$bind_type
is one of the DBI constantsSQL_CHAR
,SQL_VARCHAR
,SQL_LONGVARCHAR
,SQL_WLONGVARCHAR
,SQL_WVARCHAR
,SQL_WCHAR
orSQL_CLOB
.
inject_callbacks
$injector->inject_callbacks($dbh);
Injects callbacks into the given database handle. If that handle already has callbacks for the same methods, the system will arrange for those other callbacks to be called after all string arguments have been upgraded to utf8.
ARCHITECTURAL NOTES
Object-orientedness
Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the object model is extremely simple, there was no ground for using a sophisticated object system.
Strings are modified in-place
String arguments to DBI methods are modified in-place. It is unlikely that this would affect your client program, but if it does, you need to make your own string copies before passing them to the DBI methods.
Possible redundancies
DBI does not precisely document which of its public methods call each other. For example, one would think that execute()
internally calls bind_param()
, but this does not seem to be the case. So, to be on the safe side, callbacks installed here make no assumptions about string transformations performed by other callbacks. There might be some redundancies, but it does no harm since strings are never upgraded twice.
Caveats
The bind_param_inout()
method is not covered -- the client program must do the proper updates if that method is used to send strings to the database.
AUTHOR
Laurent Dami, <dami at cpan.org>
COPYRIGHT AND LICENSE
Copyright 2023 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.