NAME
Apache::Dynagzip - mod_perl extension for Apache-1.3.X
to compress the response with gzip
format.
ABSTRACT
This Apache handler provides dynamic content compression of the response data stream for HTTP/1.0
and HTTP/1.1
requests.
Standard gzip
compression is optionally combined with extra light
compression, which eliminates leading blank spaces and/or blank lines within the source document. This extra light
compression could be applied even when the client (browser) is not capable to decompress gzip
format.
This handler helps to compress the outbound HTML content usually by 3 to 20 times, and provides a list of useful features.
This handler is particularly useful for compressing outgoing web content which is dynamically generated on the fly (using templates, DB data, XML, etc.), when at the time of the request it is impossible to determine the length of the document to be transmitted. Support for Perl, Java, and C source generators is provided.
Besides the benefits of reduced document size, this approach gains efficiency from being able to overlap the various phases of data generation, compression, transmission, and decompression. In fact, the browser can start to decompress a document which has not yet been completely generated.
SYNOPSIS
There is more then one way to configure Apache to use this handler...
Compress the regular (static) HTML files
======================================================
Static html file (size=149208) no light compression:
======================================================
httpd.conf:
PerlModule Apache::Dynagzip
<Files ~ "*\.html">
SetHandler perl-script
PerlHandler Apache::Dynagzip
</Files>
error_log:
[Fri May 31 12:36:57 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /html/wowtmovie.html HTTP/1.1
targeting /var/www/html/wowtmovie.html via /html/wowtmovie.html
Light Compression is Off. Source comes from Plain File.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Fri May 31 12:36:57 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/html/wowtmovie.html
[Fri May 31 12:36:57 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/html/wowtmovie.html
client-side log:
C05 --> S06 GET /html/wowtmovie.html HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Referer: http://devl4.outlook.net/html/
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Fri, 31 May 2002 17:36:57 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Friday, 31-May-2002 17:41:57 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 9411 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
1314 (hex) = 4884 (dec)
3ed (hex) = 1005 (dec)
354 (hex) = 852 (dec)
450 (hex) = 1104 (dec)
5e6 (hex) = 1510 (dec)
0 (hex) = 0 (dec)
== Latency = 0.170 seconds, Extra Delay = 0.440 seconds
== Restored Body was 149208 bytes ==
======================================================
Static html file (size=149208) with light compression:
======================================================
httpd.conf:
PerlModule Apache::Dynagzip
<Files ~ "*\.html">
SetHandler perl-script
PerlHandler Apache::Dynagzip
PerlSetVar LightCompression On
</Files>
error_log:
[Fri May 31 12:49:06 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /html/wowtmovie.html HTTP/1.1
targeting /var/www/html/wowtmovie.html via /html/wowtmovie.html
Light Compression is On. Source comes from Plain File.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Fri May 31 12:49:07 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/html/wowtmovie.html
[Fri May 31 12:49:08 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/html/wowtmovie.html
client-side log:
C05 --> S06 GET /html/wowtmovie.html HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Referer: http://devl4.outlook.net/html/
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Fri, 31 May 2002 17:49:06 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Friday, 31-May-2002 17:54:06 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 8515 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
119f (hex) = 4511 (dec)
3cb (hex) = 971 (dec)
472 (hex) = 1138 (dec)
736 (hex) = 1846 (dec)
0 (hex) = 0 (dec)
== Latency = 0.280 seconds, Extra Delay = 0.820 seconds
== Restored Body was 128192 bytes ==
Default values for the minChunkSizeSource
and the minChunkSize
will be in effect in this case. To overwrite them try for example
<IfModule mod_perl.c>
PerlModule Apache::Dynagzip
<Files ~ "*\.html">
SetHandler perl-script
PerlHandler Apache::Dynagzip
PerlSetVar minChunkSizeSource 36000
PerlSetVar minChunkSize 9
</Files>
</IfModule>
Compress the output stream of the Perl scripts
===============================================================================
GET dynamically generated (by perl script) html file with no light compression:
===============================================================================
httpd.conf:
PerlModule Apache::Filter
PerlModule Apache::Dynagzip
<Directory /var/www/perl/>
SetHandler perl-script
PerlHandler Apache::RegistryFilter Apache::Dynagzip
PerlSetVar Filter On
PerlSetVar UseCGIHeadersFromScript Off
PerlSendHeader Off
PerlSetupEnv On
AllowOverride None
Options ExecCGI FollowSymLinks
Order allow,deny
Allow from all
</Directory>
error_log:
[Sat Jun 1 11:59:47 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /perl/start_example.cgi HTTP/1.1
targeting /var/www/perl/start_example.cgi via /perl/start_example.cgi
Light Compression is Off. Source comes from Filter Chain.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Sat Jun 1 11:59:47 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/perl/start_example.cgi
[Sat Jun 1 11:59:47 2002] [debug] /usr/local/share/perl/5.6.1/Apache/Dynagzip.pm(594):
[client 12.250.100.179] Apache::Dynagzip default_content_handler creates own HTTP headers
for GET /perl/start_example.cgi HTTP/1.1
[Sat Jun 1 11:59:47 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/perl/start_example.cgi
client-side log:
C05 --> S06 GET /perl/start_example.cgi HTTP/1.1
C05 --> S06 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/msword, */*
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Sat, 01 Jun 2002 16:59:47 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Saturday, 01-June-2002 17:04:47 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 758 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
2db (hex) = 731 (dec)
0 (hex) = 0 (dec)
== Latency = 0.220 seconds, Extra Delay = 0.050 seconds
== Restored Body was 1434 bytes ==
============================================================================
GET dynamically generated (by perl script) html file with light compression:
============================================================================
httpd.conf:
PerlModule Apache::Filter
PerlModule Apache::Dynagzip
<Directory /var/www/perl/>
SetHandler perl-script
PerlHandler Apache::RegistryFilter Apache::Dynagzip
PerlSetVar Filter On
PerlSetVar UseCGIHeadersFromScript Off
PerlSetVar LightCompression On
PerlSendHeader Off
PerlSetupEnv On
AllowOverride None
Options ExecCGI FollowSymLinks
Order allow,deny
Allow from all
</Directory>
error_log:
[Sat Jun 1 12:09:14 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /perl/start_example.cgi HTTP/1.1
targeting /var/www/perl/start_example.cgi via /perl/start_example.cgi
Light Compression is On. Source comes from Filter Chain.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Sat Jun 1 12:09:14 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/perl/start_example.cgi
[Sat Jun 1 12:09:14 2002] [debug] /usr/local/share/perl/5.6.1/Apache/Dynagzip.pm(594):
[client 12.250.100.179] Apache::Dynagzip default_content_handler creates own HTTP headers
for GET /perl/start_example.cgi HTTP/1.1
[Sat Jun 1 12:09:14 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/perl/start_example.cgi
client-side log:
C05 --> S06 GET /perl/start_example.cgi HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Sat, 01 Jun 2002 17:09:13 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Saturday, 01-June-2002 17:14:14 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 750 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
2d3 (hex) = 723 (dec)
0 (hex) = 0 (dec)
== Latency = 0.280 seconds, Extra Delay = 0.000 seconds
== Restored Body was 1416 bytes ==
Compress the outgoing stream from the CGI binary
====================================================================================
GET dynamically generated (by C-written binary) html file with no light compression:
====================================================================================
httpd.conf:
PerlModule Apache::Dynagzip
<Directory /var/www/cgi-bin/>
SetHandler perl-script
PerlHandler Apache::Dynagzip
AllowOverride None
Options +ExecCGI
PerlSetupEnv On
PerlSetVar BinaryCGI On
Order allow,deny
Allow from all
</Directory>
error_log:
[Fri May 31 18:18:17 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /cgi-bin/mylook.cgi HTTP/1.1
targeting /var/www/cgi-bin/mylook.cgi via /cgi-bin/mylook.cgi
Light Compression is Off. Source comes from Binary CGI.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Fri May 31 18:18:17 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/cgi-bin/mylook.cgi
[Fri May 31 18:18:17 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler has no notes.
[Fri May 31 18:18:17 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/cgi-bin/mylook.cgi
client-side log:
C05 --> S06 GET /cgi-bin/mylook.cgi HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Fri, 31 May 2002 23:18:17 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Friday, 31-May-2002 23:23:17 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 1002 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
3cf (hex) = 975 (dec)
0 (hex) = 0 (dec)
== Latency = 0.110 seconds, Extra Delay = 0.110 seconds
== Restored Body was 1954 bytes ==
=================================================================================
GET dynamically generated (by C-written binary) html file with light compression:
=================================================================================
httpd.conf:
PerlModule Apache::Dynagzip
<Directory /var/www/cgi-bin/>
SetHandler perl-script
PerlHandler Apache::Dynagzip
AllowOverride None
Options +ExecCGI
PerlSetupEnv On
PerlSetVar BinaryCGI On
PerlSetVar LightCompression On
Order allow,deny
Allow from all
</Directory>
error_log:
[Fri May 31 18:37:45 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is serving the main request for GET /cgi-bin/mylook.cgi HTTP/1.1
targeting /var/www/cgi-bin/mylook.cgi via /cgi-bin/mylook.cgi
Light Compression is On. Source comes from Binary CGI.
The client Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) accepts GZIP.
[Fri May 31 18:37:45 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
starts gzip using minChunkSizeSource = 32768 minChunkSize = 8 for /var/www/cgi-bin/mylook.cgi
[Fri May 31 18:37:45 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler has no notes.
[Fri May 31 18:37:45 2002] [info] [client 12.250.100.179] Apache::Dynagzip default_content_handler
is done OK for /var/www/cgi-bin/mylook.cgi
client-side log:
C05 --> S06 GET /cgi-bin/mylook.cgi HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Fri, 31 May 2002 23:37:45 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Expires: Friday, 31-May-2002 23:42:45 GMT
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 994 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
3c7 (hex) = 967 (dec)
0 (hex) = 0 (dec)
== Latency = 0.170 seconds, Extra Delay = 0.110 seconds
== Restored Body was 1862 bytes ==
Dynamic Setup/Configuration from the Perl Code
Alternatively, you can control this handler from your own perl-written handler which is serving the earlier phase of the request processing. For example, I'm using the dynamic installation of the Apache::Dynagzip
from my PerlTransHandler
to serve the HTML content cache appropriately.
use Apache::RegistryFilter;
use Apache::Dynagzip;
. . .
$r->handler("perl-script");
$r->push_handlers(PerlHandler => \&Apache::RegistryFilter::handler);
$r->push_handlers(PerlHandler => \&Apache::Dynagzip::handler);
In your perl code you can even extend the main config
settings (for the current request) with:
$r->dir_config->set(minChunkSizeSource => 36000);
$r->dir_config->set(minChunkSize => 6);
for example...
Common Notes
Over HTTP/1.0
handler indicates the end of data stream by closing connection. Over HTTP/1.1
the outgoing data is compressed within a chunked outgoing stream, keeping the connection alive.
The appropriate combination of the HTTP headers
X-Module-Sender: Apache::Dynagzip
Transfer-Encoding: chunked
Content-Encoding: gzip
Vary: Accept-Encoding
will be added to response when required. No HTTP header of the Content-Length
will be provided in any case...
INTRODUCTION
From a historical point of view this package was developed mainly to compress the output of a proprietary CGI binary written in C that was widely used by Outlook Technologies, Inc. to deliver uncompressed dynamically generated HTML content over the Internet using HTTP/1.0
since the mid-'90s. We were then presented with the challenge of using the content compression features over HTTP/1.1
on busy production servers, especially those serving heavy traffic on virtual hosts of popular American broadcasting companies.
The very first our attempts to implement the static gzip approach to compress the dynamic content helped us to scale effectively the bandwidth of BBC backend by the cost of significantly increased latency of the content delivery.
Actually, in accordance with my own observations, the delay of the content's download (up to the moment when the page is able to run the onLoad() JavaScript) was not increased even on fast connections, and it was significantly decreased on dial-ups. Indeed, the BBC editors were not too happy to wait up to a minute sitting in front of the sleeping screen when the backend updates some hundreds of Kbytes of the local content...
That was why I came up with the idea to use the chunked data transmission of the gzipped content sharing some real time between the server side data creation/compression, some data transmission, and the client side data decompression/presentation, and providing the end users with the partially displayed content as soon as it's possible in particular conditions of the user's connection.
At the time we decided to go for the dynamic compression there was no appropriate software on the market, which could be customized to target our goals effectively. Even later in February 2002 Nicholas Oxhøj wrote to the mod_perl mailing list about his experience to find the Apache gzipper for the streaming outgoing content:
"... I have been experimenting with all the different Apache compression modules I have been able to find, but have not been able to get the desired result. I have tried Apache::GzipChain, Apache::Compress, mod_gzip and mod_deflate, with different results. One I cannot get to work at all. Most work, but seem to collect all the output before compressing it and sending it to the browser...
... Wouldn't it be nice to have some option to specify that the handler should flush and send the currently compressed output every time it had received a certain amount of input or every time it had generated a certain amount of output?..
... So I am basically looking for anyone who has had any success in achieving this kind of "streaming" compression, who could direct me at an appropriate Apache module."
Unfortunately for him, the Apache::Dynagzip
has not yet been publicly available at that time...
Since relesed this handler is the most useful when you need to compress the outgoing Web content, which is dynamically generated on the fly (using the templates, DB data, XML, etc.), and when at the moment of the request it is impossible to determine the length of the document you have to transmit.
You may benefit additionally from the fact that the handler begins the transmission of the compressed data when the very first portion of outgoing data is arrived from the main data source only, at the moment when probably the source big HTML document has not been generated in full yet. So far, the transmission will be done partly at the same time of the document creation. From other side, the internal buffer within the handler prevents the Apache from the creation of too short chunks (for HTTP/1.1
).
In order to simplify the use of this handler on public/open_source sites, the content compression over HTTP/1.0 was added to this handler since the version 0.06. This implementation helps to avoid the dynamic invocation of the Apache handler for the content generation phase, providing wider service from one the same statically configured handler.
Acknowledgments
Thanks to Tom Evans, Valerio Paolini, and Serge Bizyayev for their valuable idea contributions and multiple testing. Thanks to Igor Sysoev and Henrik Nordstrom who helped me to understand better the HTTP/1.0 compression features.
Obviously, I hold the full responsibility for how all those contributions are used here.
DESCRIPTION
The main pupose of this package is to serve the content generation phase
within the mod_perl enabled Apache 1.3.X
, providing the dynamic on the fly compression of web content. It is done with the use of zlib
library via the Compress::Zlib
perl interface to serve both HTTP/1.0
and HTTP/1.1
requests from those clients/browsers, who understands gzip
format and can decompress this type of data on the fly.
This handler does never gzip
content for those clients/browsers, who fails to declare the ability to decompress gzip
format. In fact, this handler mainly serves as a kind of customizable filter of outbound web content for Apache 1.3.X
.
This handler is supposed to be used in the Apache::Filter
chain mostly to serve the outgoing content dynamically generated on the fly by Perl and/or Java. It is featured to serve the regular CGI binaries (C-written for examle) as a standalong handler out of the Apache::Filter
chain. As an extra option, this handler can be used to compress dynamically the huge static files, and to transfer the gzipped content in the form of stream back to the client browser. For the last purpose the Apache::Dynagzip
handler should be used as a standalong handler out of the Apache::Filter
chain too.
Working over the HTTP/1.0
this handler indicates the end of data stream by closing connection. Indeed, over HTTP/1.1
the outgoing data is compressed within a chunked outgoing stream, keeping the connection alive. Resonable control over the chunk-size is provided in this case.
In order to serve better the older web clients the extra light
compression is provided independently to remove unnecessary leading blank spaces and/or blank lines from the outgoing web content. This extra light
compression could be combined with the main gzip
compression, when necessary.
The list of features of this handler includes:
- · Support for both HTTP/1.0 and HTTP/1.1 requests.
- · Reasonable control over the size of content chunks for HTTP/1.1.
- · Support for Perl, Java, or C/C++ CGI applications in order to provide dynamic on-the-fly compression of outbound content.
- · Optional
extra light
compression for all browsers, including older ones that incapable to decompress gzipped content. - · Optional control over the duration of the content's life in client/proxy local cache.
- · Limited control over the proxy caching.
- · Optional support for server-side caching of dynamically generated content.
Compression Features
Apache::Dynagzip
provides content compression for both HTTP/1.0
and HTTP/1.1
when appropriate.
There are two types of compression, which could be applied to the outgoing content by this handler:
- extra light compression
- gzip compression
in any appropriate combination.
An extra light
compression is provided to remove leading blank spaces and/or blank lines from the outgoing web content. It is supposed to serve the ASCII data types like html
, JavaScript
, css
, etc. The implementation of extra light
compression is turned off by default. It could be turned on with the statement
PerlSetVar LightCompression On
in your httpd.conf
. Any other value turns the extra light
compression off.
The main gzip
format is described in rfc1952. This type of compression is applied when the client is recognized as capable to decompress gzip
format on the fly. In this version the decision is under the control of whether the client sends the Accept-Encoding: gzip
HTTP header, or not.
On HTTP/1.1
, when the gzip
compression is in effect, handler keeps the resonable control over the size of the chunks and over the compression ratio using the combination of two internal variables which could be set in your httpd.conf
:
minChunkSizeSource
minChunkSize
The minChunkSizeSource
defines the minimum length of the source stream which zlib
may accumulate in its internal buffer.
- Note:
-
The compression ratio depends on the length of the data, accumulated in that buffer; More data we keep over there - better ratio will be achieved...
When the length defined by the minChunkSizeSource
is exceeded, the handler flushes the internal buffer of zlib
and transfers the accumulated portion of the compreesed data to the own internal buffer in order to create appropriate chunk(s).
This buffer is not nessessarily be fransfered to Appache immediately. The decision is under the control of the minChunkSize
internal variable. When the size of the buffer exceeds the value of minChunkSize
the handler chunks the internal buffer and transfers the accumulated data to the Client.
This approach helps to create the effective compression combined with the limited latency.
For example, when I use
PerlSetVar minChunkSizeSource 16000
PerlSetVar minChunkSize 8
in my httpd.conf
to compress the dynamically generated content of the size of some 54,000 bytes, the client side log
C05 --> S06 GET /pipe/pp-pipe.pl/big.html?try=chunkOneMoreTime HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
## Sockets 6 of 4,5,6 need checking ##
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Thu, 21 Feb 2002 20:01:47 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Vary: Accept-Encoding
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 6034 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
949 (hex) = 2377 (dec)
5e6 (hex) = 1510 (dec)
5c5 (hex) = 1477 (dec)
26e (hex) = 622 (dec)
0 (hex) = 0 (dec)
== Latency = 0.990 seconds, Extra Delay = 0.110 seconds
== Restored Body was 54655 bytes ==
shows that the first chunk consists of the gzip header only (10 bytes). This chunk was sent as soon as the handler received the first portion of the data generated by the foreign CGI script. The data itself at that moment has been storied in the zlib's internal buffer, because the minChunkSizeSource
is big enough.
- Note:
-
Longer we allow zlib to keep its internal buffer - better compression ratio it makes for us...
So far, in this example we have obtained the compression ratio at about 9 times.
In this version the handler provides defaults:
minChunkSizeSource = 32768
minChunkSize = 8
for your convenience.
In case of gzip
compressed response to HTTP/1.0
request, handler uses minChunkSize
and minChunkSizeSource
values to limit the minimum size of internal buffers in order to provide appropriate compression ratio, and to avoid multiple short outputs to the core Apache.
Chunking Features
On HTTP/1.1
this handler overwrites the default Apache behavior, and keeps the own control over the chunk-size when it is possible. In fact, handler provides the soft control over the chunk-size only: It does never cut the incoming string in order to create a chunk of a particular size. Instead, it controls the minimum size of the chunk only. I consider this approach reasonable, because to date the HTTP chunk-size is not coordinated with the packet-size on transport level.
In case of gzipped output the minimum size of the chunk is under the control of internal variable
minChunkSize
In case of uncompressed output, or the extra light
compression only, the minimum size of the chunk is under the control of internal variable
minChunkSizePP
In this version for your convenience the handler provides defaults:
minChunkSize = 8
minChunkSizePP = 8192
You may overwrite the default values of these variables in your httpd.conf
if necessary.
- Note:
-
The internal variable
minChunkSize
should be treated carefully together with theminChunkSizeSource
(see Compression Features).
In this version handler does not keep the control over the chunk-size when it serves the internally redirected request. An appropriate warning is placed to error.log
in this case.
In case of gzip
compressed response to HTTP/1.0
request, handler uses minChunkSize
and minChunkSizeSource
values to limit the minimum size of internal buffers in order to provide appropriate compression ratio, and to avoid multiple short outputs to the core Apache.
Filter Chain Features
As a member of the Apache::Filter
chain, the Apache::Dynagzip
handler is supposed to be the last filter in the chain, because of the features of it's functions: It produces the full set of required HTTP headers followed by the gzipped content within the chunked stream.
No one of other handlers in Filter
chain is allowed to issue
$r->send_http_header();
or
$r->send_cgi_header();
The only acceptable HTTP information from the old CGI applications is the Content-Type
CGI header which should be the first line followed by the empty line. This line is optional in accordance with the CGI/1.0
description, and many known old scripts ignore this option, which should default to Content-Type: text/html
. CGI/1.1
(see: http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html ) makes the life even more complicated for the system administrators.
This handler is partially CGI/1.1 compatible, except the internal redirect option, which is not guaranteed.
POST Request Features
I have to serve the POST request option for the rgular CGI binary only, because in this case the handler is standing along to serve the data flow in both directions at the moment when the stdin
is tied into Apache, and could not be exposed to CGI binary transparently.
To solve the problem I alter POST with GET internally doing the required incoming data transformations. It could cause a problem, when you have a huge incoming stream from your client (more than 4K bytes).
Control over the Client Cache
The control over the lifetime of the response in client's cache is provided with Expires
HTTP header (see rfc2068):
The Expires entity-header field gives the date/time after which the response should be considered stale. A stale cache entry may not normally be returned by a cache (either a proxy cache or an user agent cache) unless it is first validated with the origin server (or with an intermediate cache that has a fresh copy of the entity). The format is an absolute date and time as defined by HTTP-date in section 3.3; it MUST be in rfc1123-date format: Expires = "Expires" ":" HTTP-date
This handler creates the Expires
HTTP header, adding the pageLifeTime
to the date-time of the request. The internal variable pageLifeTime
has default value
pageLifeTime = 300 # sec.
which could be overwriten in httpd.conf
for example as:
PerlSetVar pageLifeTime 1800
to make the pageLifeTime = 30 minutes
.
Within the lifetime the client (browser) will not even try to access the server when you reach the same URL again. Instead, it restarts the page from the local cache.
It's important to point out here, that all initial JavaScripts will be restarted indeed, so you can rotate your advertisements and dynamic content when needed.
The second important point should be mentioned here: when you click the "Refresh" button, the browser will reload the page from the server unconditionally. This is right behavior, because it is exactly what the end-user expects from the "Refresh" button.
- Note:
-
the lifetime defined by Expires depends on accuracy of time settings on client side. If your client's local clock is running 1 hour back, the cached copy of the page will be alive 60 minutes longer on that machine.
Support for the Server-Side Cache
To support the Server-Side Cache I place the reference to the dynamically generated document to the notes()
when the Server-Side Cache Support is ordered. The referenced document could be already compressed with extra light
compression, if it was ordered for the current request.
The effective gzip
compression is supposed to take place within the log
stage of the request processing.
From the historical point of view, the development of this handler was a stage of a wider project, named Apache::ContentCache
, which is supposed to provide the content caching capabilities to the wide range of arbitrary sites, being generated on the fly for some reasons. In that project the Apache::Dynagzip
handler is used in the dynamically generated chain of Apache handlers for various phases of the request processing to filter the content generation phase of the appropriate request. To be compatible with the Apache::ContentCache
flow chart, the Apache::Dynagzip
handler recognizes the optional reference in the notes()
, named ref_cache_files
. When the ref_cache_files
is defined within the notes()
table, the Apache::Dynagzip
handler creates one more reference named ref_source
within the notes()
to reference the full body of uncompressed incoming document for the post request processing phase.
You usually should not care about this feature of the Apache::Dynagzip
handler unless you use it in your own chain of handlers for the various phases of the request processing.
Control over the Proxy Cache.
Control over the possible proxy cache is provided with Vary
HTTP header (see rfc2068 for details). In this version the header is always generated in form of
for gzipped output only.
Advanced control over the proxy cache is provided since the version 0.07 with optional extension of Vary HTTP header. This extension could be placed into your configuration file, using directive
Particularly, it might be helpful to indicate the content, which depends on some conditions, other than just compression features. For example, when the content is personalized, someone might wish to use the * Vary
extension to prevent any proxy caching.
When the outgoing content is gzipped, this extension will be appended to the regular Vary
header, like in the following example:
Using the following fragment within the http.conf
:
PerlModule Apache::Dynagzip
<Files ~ "*\.html">
SetHandler perl-script
PerlHandler Apache::Dynagzip
PerlSetVar LightCompression On
PerlSetVar Vary *
</Files>
We observe the client-side log in form of:
C05 --> S06 GET /devdoc/Dynagzip/Dynagzip.html HTTP/1.1
C05 --> S06 Accept: */*
C05 --> S06 Referer: http://devl4.outlook.net/devdoc/Dynagzip/
C05 --> S06 Accept-Language: en-us
C05 --> S06 Accept-Encoding: gzip, deflate
C05 --> S06 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
C05 --> S06 Host: devl4.outlook.net
C05 --> S06 Pragma: no-cache
C05 --> S06 Accept-Charset: ISO-8859-1
== Body was 0 bytes ==
C05 <-- S06 HTTP/1.1 200 OK
C05 <-- S06 Date: Sun, 11 Aug 2002 21:28:43 GMT
C05 <-- S06 Server: Apache/1.3.22 (Unix) Debian GNU/Linux mod_perl/1.26
C05 <-- S06 X-Module-Sender: Apache::Dynagzip
C05 <-- S06 Expires: Sunday, 11-August-2002 21:33:43 GMT
C05 <-- S06 Vary: Accept-Encoding,*
C05 <-- S06 Transfer-Encoding: chunked
C05 <-- S06 Content-Type: text/html; charset=iso-8859-1
C05 <-- S06 Content-Encoding: gzip
C05 <-- S06 == Incoming Body was 11311 bytes ==
== Transmission: text gzip chunked ==
== Chunk Log ==
a (hex) = 10 (dec)
1c78 (hex) = 7288 (dec)
f94 (hex) = 3988 (dec)
0 (hex) = 0 (dec)
== Latency = 0.160 seconds, Extra Delay = 0.170 seconds
== Restored Body was 47510 bytes ==
CUSTOMIZATION
Do your best to avoid the implementation of this handler in internally redirected requests. It does not help much in this case. Read your error.log
carefully to find the appropriate warnings. Tune your http.conf
carefully to take the most from opportunities offered by this handler.
To select the type of the content's source follow the rules:
- use
Apache::Filter
chain to serve any Perl, or Java generated content. When your source is a very old CGI-application, which fails to provide the Content-Type CGI header, usePerlSetVar UseCGIHeadersFromScript Off
in your httpd.conf to overwrite the Document Content-Type to default text/html.
you may use
Apache::Filter
chain to serve another sources, when you know what you are doing. You might wish to write your own handler and include it intoApache::Filter
chain, emulating the CGI outgoing stream.- use the directive
PerlSetVar BinaryCGI On
to indicate that the source-generator is supposed to be a CGI binary. Don't use
Apache::Filter
chain in this case. Support for CGI/1.1 headers is always On for this type of the source.- it will be assumed the plain file transfer, when you use the standing-along handler with no BinaryCGI directive. The Document Content-Type is determined by Apache in this case.
To control the compression ratio and the minimum size of the chunk/buffer for gzipped content you can optionally use directives
PerlSetVar minChunkSizeSource <value>
PerlSetVar minChunkSize <value>
for example you can try
PerlSetVar minChunkSizeSource 32768
PerlSetVar minChunkSize 8
which are the default in this version. Indeed, you can use your own values, when you know what you are doing...
- Note:
-
You can improve the compression ratio when you increase the value of
minChunkSizeSource
. You can control the _minimum_ size of the chunk with theminChunkSize
.Try to play with these values to find out your best combination!
To control the minimum size of the chunk for uncompressed content over HTTP/1.1 you can optionally use the directive
PerlSetVar minChunkSizePP <value>
To control the extra light
compression you can optionally use the directive
PerlSetVar LightCompression <On/Off>
To turn On the extra light
compression use the directive
PerlSetVar LightCompression On
Any other value turns the extra light
compression Off (default).
To control the pageLifeTime
in client's local cache you can optionally use the directive
PerlSetVar pageLifeTime <value>
where the value stands for the life-length in seconds.
PerlSetVar pageLifeTime 300
is default in this version.
TROUBLESHOOTING
This handler fails to keep the control over the chunk-size when it serves the internally redirected request. The same time it fails to provide the gzip
compression. A corresponding warning is placed to error.log
in this case. Make the appropriate configuration tunings to avoid the implementation of this handler for internally redirected request(s).
The handler logs error
, warn
, info
, and debug
messages to the Apache error.log
file. Please, read it first in case of any trouble.
DEPENDENCIES
This module requires these other modules and libraries:
Apache::Constants;
Apache::File;
Apache::Filter 1.019;
Apache::Log;
Apache::URI;
Apache::Util;
Fcntl;
FileHandle;
Compress::LeadingBlankSpaces;
Compress::Zlib 1.16;
Note: the Compress::Zlib 1.16 requires the Info-zip zlib 1.0.2 or better
(it is NOT compatible with versions of zlib <= 1.0.1).
The zlib compression library is available at http://www.gzip.org/zlib/
I didn't test this handler with previous versions of the Apache::Filter.
Please, let me know if you have a chance to do that...
AUTHOR
Slava Bizyayev <slava@cpan.org> - Freelance Software Developer & Consultant.
COPYRIGHT AND LICENSE
Copyright (C) 2002 Slava Bizyayev. All rights reserved.
This package is free software. You can use it, redistribute it, and/or modify it under the same terms as Perl itself.
The latest version of this module can be found on CPAN.
SEE ALSO
mod_perl
at http://perl.apache.org
Compress::LeadingBlankSpaces
module can be found on CPAN.
Compress::Zlib
module can be found on CPAN.
The primary site for the zlib
compression library is http://www.info-zip.org/pub/infozip/zlib/.
Apache::Filter
module can be found on CPAN.
http://www.ietf.org/rfc.html - rfc search by number (+ index list)
http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html CGI/1.1 rfc
http://perl.apache.org/docs/general/correct_headers/correct_headers.html "Issuing Correct HTTP Headers" by Andreas Koenig
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1836:
Non-ASCII character seen before =encoding in 'Oxhøj'. Assuming CP1252