NAME
Apache::Request::I18N - Internationalization extension to Apache::Request
SYNOPSIS
use Apache::Request::I18N;
my $apr = Apache::Request::I18N->new($r, DECODE_PARMS => 'utf-8');
Or, add something like this to your Apache httpd.conf:
PerlModule Apache::Request::I18N;
<Location ...>
SetHandler perl-script
PerlHandler Apache::Request::I18N <your other handlers ...>
PerlSetVar DecodeParms utf-8
</Location>
DESCRIPTION
Apache::Request::I18N adds transparent support over Apache::Request for internationalized GET/POST parameters. Form field names and values are automatically decoded and converted either to Perl's internal UTF-8 format, or to another character encoding.
Since this module inherits from Apache::Request, it can be used as a drop-in replacement. (It is not a perfect replacement, though; see "COMPATIBILITY ISSUES" below.) It can also be used in a PerlHandler directive, in which case all subsequent handlers will -- if they play nicely -- automatically see the converted names and values.
CONSTRUCTORS
- new( REQ [, OPTIONS ] )
-
Creates and returns a new Apache::Request::I18N object. REQ is the Apache or Apache::Request associated with the current request.
OPTIONS is an optional list of name/value pairs. Each option also has a corresponding mod_perl variable (listed in parentheses) that can be set via PerlSetVar in httpd.conf. Values in OPTIONS take precedence. The available options are:
- DECODE_PARMS (DecodeParms)
-
Required. Declares the character encoding that will be used by default when decoding form field names and values. This character encoding must be supported by the Encode module (see Encode::Supported for more details).
- ENCODE_PARMS (EncodeParms)
-
Declares the character encoding that will be used to re-encode form field names and values. If omitted, names and values will be in Perl's own internal UTF-8 format.
Apache::Request options can also be included (although they will be ignored if REQ is already an Apache::Request object).
- instance( REQ [, OPTIONS ] )
-
Equivalent to the instance() method in Apache::Request, except that this method will return a Apache::Request::I18N object. Subsequent calls to Apache::Request->instance() will also return the same object. It is allowed to call Apache::Request->instance() beforehand.
METHODS
Almost all Apache::Request methods are supported (see "COMPATIBILITY ISSUES" below for a list of exceptions), and will properly return values according to ENCODE_PARMS. (Apache methods, like args(), are not affected by this module.)
All arguments passed to a method must be encoded to ENCODE_PARMS beforehand, unless ENCODE_PARMS is empty. This also applies to each key/value of any Apache::Table passed to parms().
Additional methods
FILE UPLOADS
Uploads returned by the upload() method are Apache::Upload::I18N objects; they behave like Apache::Upload objects, and their name() and filename() methods will return values according to ENCODE_PARMS.
(This is however not the case within the upload hook; see "BUGS" below.)
HANDLER
This module provides a simple Apache handler that can be used in a PerlHandler directive. This is useful when used in combination with other handlers, which will then automatically access the decoded values. (This works as long as each handler takes care to call instance() instead of creating a new object.)
For example, you can use this module in combination with Mason:
SetHandler perl-script
PerlHandler +Apache::Request::I18N +HTML::Mason::ApacheHandler
PerlSetVar DecodeParms EUC-JP
Each Mason component will now see its arguments as true Perl character strings instead of EUC-JP bytes strings.
COMPATIBILITY ISSUES
Calling parms() is not supported if ENCODE_PARMS is empty, as Apache::Table cannot handle character strings. This also applies to calling param() in scalar context.
Query parameter keys may or may not be case-insensitive, depending on their contents and on ENCODE_PARMS.
Calling next() on an upload object is not currently supported.
BUGS
When using the multipart/form-data encoding, the proper encoding of form field names and filenames as specified by RFC 2184 is currently not supported. (This is due to a limitation in libapreq.)
Conversely, since some user-agents are known to encode such values via RFC 2047, we attempt decoding if possible. This means that a value supplied by a standard-compliant user-agent may be wrongly decoded.
When using the multipart/form-data encoding, each form field value may have its character encoding specified via the charset parameter of its Content-Type header. This value is currently ignored. (This is due to a limitation in libapreq.)
Similarly, the Content-Transfer-Encoding header is also ignored.
When using upload hooks, the upload object supplied to UPLOAD_HOOK will not have had its name() and filename() decoded yet.
When using the multipart/form-data encoding, this module will get confused if a form field appears in both the query string and the request body. In other words, don't try to do this:
<FORM METHOD=post ENCTYPE="multipart/form-data" ACTION=".../my_script?foo=1"> <INPUT NAME="foo" ...> ...
You should also avoid mixing file uploads and regular input within a single field name. In other words, don't try this either:
<INPUT TYPE=text NAME="foo"> <INPUT TYPE=file NAME="foo">
Since all query parameter keys are stored in encoded form within an Apache::Table (which is case-insensitive), it is possible for two distinct keys to be fused together if their encoded representations are similar.
TODO
Allow changing DECODE_PARMS and ENCODE_PARMS after the object has been created.
Automatically decode the contents of a text/* file upload if a charset has been provided.
Allow for more than one DECODE_PARMS, and try to guess which one is appropriate.
Use the User-Agent header to figure out how far from the standards we must stray.
Write a short text about the various standards and issues.
SEE ALSO
<http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html>
RFC 1522 - MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text
RFC 1806 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header [2.3]
RFC 1866 - Hypertext Markup Language - 2.0 [8.2.1]
RFC 1867 - Form-based File Upload in HTML [3.3, 5.11]
RFC 2047 - MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text [5]
RFC 2070 - Internationalization of the Hypertext Markup Language [5.2]
RFC 2183 - Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field [2, 2.3]
RFC 2231 - MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations
RFC 2388 - Returning Values from Forms: multipart/form-data
AUTHOR
Frédéric Brière, <fbriere@fbriere.net>
COPYRIGHT AND LICENSE
Copyright (C) 2005 by Frédéric Brière
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 570:
You forgot a '=back' before '=head1'
- Around line 586:
Non-ASCII character seen before =encoding in 'Frédéric'. Assuming CP1252