NAME
Gzip::Faster - simple and fast gzip and gunzip
SYNOPSIS
use Gzip::Faster;
my $gzipped = gzip ($input);
my $roundtrip = gunzip ($gzipped);
if ($roundtrip ne $input) { die; }
gzip_to_file ($input, 'file.gz');
$roundtrip = gunzip_file ('file.gz');
if ($roundtrip ne $input) { die; }
VERSION
This documents version 0.14_01 of Gzip::Faster corresponding to git commit 178bd3eb5ae764576dde18945e3c65aec1185fc0 made on Wed Dec 7 14:24:44 2016 +0900.
DESCRIPTION
This module compresses to and decompresses from the gzip format.
The module offers two basic functions, "gzip" and "gunzip", which convert scalars to and from gzip format, and three convenience functions: "gzip_file" reads a file then compresses it; "gunzip_file" reads a file then uncompresses it; and "gzip_to_file" compresses a scalar and writes it to a file.
FUNCTIONS
gzip
my $zipped = gzip ($plain);
This compresses $plain
into the gzip format. The return value is the compressed version of $plain
.
gunzip
my $plain = gunzip ($zipped);
This uncompresses $zipped
and returns the result of the uncompression. It returns the undefined value if $zipped
is the undefined value or an empty string. Otherwise, it throws a fatal error if $zipped
is not in the gzip format.
gzip_file
my $zipped = gzip_file ('file');
This reads the contents of file into memory and then runs "gzip" on the file's contents. The return value and the possible errors are the same as "gzip", plus this may also throw an error if open
fails.
gunzip_file
my $plain = gunzip_file ('file.gz');
This reads the contents of file.gz into memory and then runs "gunzip" on the file's contents. The return value and the possible errors are the same as "gunzip", plus this may also throw an error if open
fails.
gzip_to_file
gzip_to_file ($plain, 'file.gz');
This compresses $plain
in memory using "gzip" and writes the compressed content to 'file.gz'. There is no return value. The errors are the same as "gzip", plus this may also throw an error if open
fails. As of this version, it does not write any gzip header information to file.gz.
PERFORMANCE
This section compares the performance of Gzip::Faster with IO::Compress::Gzip / IO::Uncompress::Gunzip and Compress::Raw::Zlib.
Short text
The compression and decompression parts test the performance of the modules on a short piece of English text.
According to my results, Gzip::Faster is about five times faster to load, seven times faster to compress, and twenty-five times faster to uncompress than IO::Compress::Gzip and IO::Uncompress::Gunzip. Round trips are about ten times faster with Gzip::Faster.
Compared to Compress::Raw::Zlib, load times are about one and a half times faster, round trips are about three times faster, compression is about two and a half times faster, and decompression is about six times faster.
The versions used in this test are as follows:
$IO::Compress::Gzip::VERSION = 2.069
$IO::Uncompress::Gunzip::VERSION = 2.069
$Compress::Raw::Zlib::VERSION = 2.069
$Gzip::Faster::VERSION = 0.12_01
The size after compression is as follows:
IO::Compress:Gzip size is 830 bytes.
Compress::Raw::Zlib size is 830 bytes.
Gzip::Faster size is 830 bytes.
Here is a comparison of load times:
Rate Load IOUG Load IOCG Load CRZ Load GF
Load IOUG 25.4/s -- -4% -66% -77%
Load IOCG 26.6/s 5% -- -65% -76%
Load CRZ 75.1/s 196% 183% -- -33%
Load GF 112/s 340% 321% 49% --
Here is a comparison of a round-trip:
Rate IO::Compress::Gzip Compress::Raw::Zlib Gzip::Faster
IO::Compress::Gzip 1309/s -- -66% -90%
Compress::Raw::Zlib 3883/s 197% -- -70%
Gzip::Faster 12929/s 887% 233% --
Here is a comparison of gzip (compression) only:
Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
IO::Compress::Gzip 2553/s -- -61% -86%
Compress::Raw::Zlib::Deflate 6465/s 153% -- -65%
Gzip::Faster 18286/s 616% 183% --
Here is a comparison of gunzip (decompression) only:
Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
IO::Uncompress::Gunzip 2819/s -- -74% -96%
Compress::Raw::Zlib::Inflate 10903/s 287% -- -84%
Gzip::Faster 69565/s 2367% 538% --
The test file is in "bench/benchmark.pl" in the distribution.
Long text
This section compares the compression on a 2.2 megabyte file of Chinese text, which is the Project Gutenberg version of Journey to the West, http://www.gutenberg.org/files/23962/23962-0.txt, with the header and footer text removed.
The versions used in this test are as above.
The sizes are as follows:
IO::Compress:Gzip size is 995387 bytes.
Compress::Raw::Zlib size is 995387 bytes.
Gzip::Faster size is 995823 bytes.
Note that the size of the file compressed with the command-line gzip, with the default compression, is identical to the size with Gzip::Faster::gzip, except for the 12 bytes in the file version used to store the file name:
$ gzip --keep chinese.txt
$ ls -l chinese.txt.gz
-rw-r--r-- 1 ben ben 995835 Oct 20 18:52 chinese.txt.gz
Here is a comparison of a round-trip:
Rate IO::Compress::Gzip Compress::Raw::Zlib Gzip::Faster
IO::Compress::Gzip 4.43/s -- -2% -8%
Compress::Raw::Zlib 4.54/s 3% -- -5%
Gzip::Faster 4.80/s 8% 6% --
Here is a comparison of gzip (compression) only:
Rate IO::Compress::Gzip Compress::Raw::Zlib::Deflate Gzip::Faster
IO::Compress::Gzip 5.04/s -- -1% -6%
Compress::Raw::Zlib::Deflate 5.07/s 1% -- -5%
Gzip::Faster 5.36/s 6% 6% --
Here is a comparison of gunzip (decompression) only:
Rate IO::Uncompress::Gunzip Compress::Raw::Zlib::Inflate Gzip::Faster
IO::Uncompress::Gunzip 36.6/s -- -19% -20%
Compress::Raw::Zlib::Inflate 45.1/s 23% -- -1%
Gzip::Faster 45.7/s 25% 1% --
For longer files, Gzip::Faster is not much faster and the underlying library's speed is the main factor.
BUGS
There is no deflate compression in the module. There is no way to select the level of compression. The level of compression offered by this module is the zlib default one, which is what you get if you run the command-line program gzip on a file without the options like --best
or --fast
.
The module source code includes disabled functionality to round-trip Perl flags. I applied this to preserving Perl's "utf8" flag. However, the mechanism I used trips a browser bug in the Firefox web browser where it produces a content encoding error message. Thus this functionality is disabled. Please refer to the file gzip-faster-perl.c in the distribution, the relevant parts are commented out with a macro COPY_PERL
.
The module doesn't check whether the input of "gzip" is already gzipped, and it doesn't check whether the compression was effective. That is, it doesn't check whether the output of "gzip" is actually smaller than the input.
EXPORTS
The module exports "gzip", "gunzip", "gzip_file", "gunzip_file", and "gzip_to_file" by default. You can switch this blanket exporting off with
use Gzip::Faster ();
or
use Gzip::Faster 'gunzip';
whereby you only get gunzip
and not the other functions exported.
INSTALLATION
Installation follows the standard Perl methods. If you do not know what the standard Perl module install methods are, detailed instructions can be found in the file README in the distribution. The following are some extra notes for people who get stuck.
Gzip::Faster requires the compression library zlib
(also called libz
) to be installed. The following message printed during perl Makefile.PL
:
You don't seem to have zlib available on your system.
or
Warning (mostly harmless): No library found for -lz
or the following message at run-time:
undefined symbol: inflate
indicate that Gzip::Faster was unable to link to libz
.
Ubuntu Linux
On Ubuntu Linux, you may need to install zlib1g-dev
using the following command:
sudo apt-get install zlib1g-dev
Windows
Unfortunately at this time the module doesn't seem to install on ActiveState Perl. You can check the current status at http://code.activestate.com/ppm/Gzip-Faster/. However, the module seems to install without problems on Strawberry Perl, so if you cannot install via ActiveState, you could try that instead.
ACKNOWLEDGEMENTS
zgrim reported an important bug related to zlib.
Aristotle Pagaltzis contributed the benchmarking code for Compress::Raw::Zlib.
SEE ALSO
- gzip
-
Even within Perl, sometimes it's a lot easier to use the command line utility
gzip
as insystem ("gzip file");
or
`gzip file`
than it is to try to figure out how to use some module or another. - mod_deflate and mod_gzip
-
These are Apache web server modules which compress web outputs on the fly.
- PerlIO::gzip
-
This is a Perl extension to provide a PerlIO layer to gzip/gunzip. That means you can just add
:gzip
when you open a file to read or write compressed files:open my $in, "<:gzip", 'file.gz' open my $out, ">:gzip", 'file.gz'
and you never have to deal with the gzip format.
- IO::Zlib
- Compress::Zlib
- Compress::Raw::Zlib
- CGI::Compress::Gzip
- IO::Compress::Gzip and IO::Uncompress::Gunzip
HISTORY
This module started as an experimental benchmark against IO::Compress::Gzip when profiling revealed that some web programs were spending the majority of their time in IO::Compress::Gzip. Because I also had some web programs in C, which use the raw zlib itself, I was aware that zlib itself was very fast, and I was surprised by the amount of time the Perl code was taking. I wrote this module to test IO::Compress::Gzip against a simplistic C wrapper. I released the module to CPAN because the results were very striking.
The code's ancestor is the example program zpipe
supplied with zlib. See http://zlib.net/zpipe.c. Gzip::Faster is little more than zpipe
reading to and and writing from Perl scalars.
The reason this module only offers gzip and not deflate compression is that gzip is more common on the web, due to some browser incompatibilities. Deflate compression could easily be added, so if you are interested, please contact the author.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2014-2016 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.