—#!/usr/bin/env perl
use
strict;
use
warnings;
use
Getopt::Long;
my
%opt
= ();
my
$opt_help
;
GetOptions (
"expand-entities"
=> \
$opt
{expand_entities},
"process-xincludes"
=> \
$opt
{process_xincludes},
"remove-blanks-start"
=> \
$opt
{remove_blanks_start},
"remove-blanks-end"
=> \
$opt
{remove_blanks_end},
"remove-spaces-line-start"
=> \
$opt
{remove_spaces_line_start},
"remove-spaces-line-end"
=> \
$opt
{remove_spaces_line_end},
"remove-indent"
=> \
$opt
{remove_spaces_line_start},
"remove-empty-text"
=> \
$opt
{remove_empty_text},
"remove-cr-lf-everywhere"
=> \
$opt
{remove_cr_lf_everywhere},
"remove-spaces-everywhere"
=> \
$opt
{remove_spaces_everywhere},
"keep-comments"
=> \
$opt
{keep_comments},
"keep-cdata"
=> \
$opt
{keep_cdatas},
"keep-pi"
=> \
$opt
{keep_pi},
"keep-dtd"
=> \
$opt
{keep_dtd},
"ignore-dtd"
=> \
$opt
{ignore_dtd},
"no-prolog"
=> \
$opt
{no_prolog},
"version=s"
=> \
$opt
{version},
"encoding=s"
=> \
$opt
{encoding},
"aggressive"
=> \
$opt
{aggressive},
"agressive"
=> \
$opt
{aggressive},
"destructive"
=> \
$opt
{destructive},
"insane"
=> \
$opt
{insane},
"help"
=> \
$opt_help
) or
die
(
"Error in command line arguments (maybe \"$0 --help\" could help ?)\n"
);
(
$opt_help
) and pod2usage(1);
my
$string
;
while
(<>) {
$string
.=
$_
;
}
minify(
$string
,
%opt
);
__END__
=head1 NAME
xml-minifier - Minify XML files
=head1 SYNOPSIS
xml-minifier file.xml
OR
cat file.xml | xml-minifier
Options:
--expand-entities expand entities
--process-xincludes process xincludes
--remove-blanks-start remove blanks before text
--remove-blanks-end remove blanks after text
--remove-spaces-line-start remove spaces/tabs before text (each line)
--remove-spaces-line-end remove spaces/tabs after text (each line)
--remove-indent remove spaces/tabs before text (each line
--remove-empty-text remove (pseudo) empty text
--remove-cr-lf-everywhere remove cr and lf everywhere
--keep-comments keep comments
--keep-cdata keep cdata
--keep-pi keep processing instructions
--keep-dtd keep dtd
--ignore-dtd ignore dtd
--no-prolog remove prolog (version and encoding)
--version specify version for the xml
--encoding specify encoding for the xml
--aggressive enable aggressive mode
--destructive enable aggressive mode
--insane enable aggressive mode
--help brief help message
=head1 OPTIONS
=over 4
=item B<--expand-entities>
Expand entities. An entity is like &foo;
=item B<--process-xincludes>
Process xicnludes. An xinclude is like <xi:include href="inc.xml"/>
=item B<--remove-blanks-start>
Remove blanks (spaces, carriage return, line feed...) in front of text nodes.
For instance <tag> foo bar</tag> will become <tag>foo bar</tag>
Agressive and therefore lossy compression.
=item B<--remove-blanks-end>
Remove blanks (spaces, carriage return, line feed...) at the end of text nodes.
For instance <tag>foo bar </tag> will become <tag>foo bar</tag>
Agressive and therefore lossy compression.
=item B<--remove-spaces-line-start>
Remove spaces and tabs at the start of each line of text nodes.
Agressive and therefore lossy compression.
=item B<--remove-spaces-line-end>
Remove spaces and tabs at the end of each line of text nodes.
Agressive and therefore lossy compression.
=item B<--remove-indent>
Remove spaces and tabs at the start of each line of text nodes.
It is actually an alias of B<--remove-spaces-line-start>
Agressive and therefore lossy compression.
=item B<--remove-empty-text>
Remove (pseudo) empty text nodes (spaces, carriage return, line feed...).
=item B<--remove-cr-lf-everywhere>
Remove carriage returns and line feed everywhere (inside text !).
For instance <tag>foo\nbar</tag> will become <tag>foobar</tag>
Very aggressive and therefore lossy compression.
=item B<--keep-comments>
Keep comments, by default they are removed. A comment is like <!-- comment -->
=item B<--keep-cdata>
Keep cdata, by default they are removed. A CDATA is like <![CDATA[ my cdata ]]>
=item B<--keep-pi>
Keep processing instructions. A processing instruction is like <?xml-stylesheet href="style.css"/>
=item B<--keep-dtd>
Keep DTD.
=item B<--ignore-dtd>
Do not read DTD. The minifier reads the DTD to get informations about meaningful blanks.
=item B<--no-version>
Do not put any version.
=item B<--version>
Specify version.
=item B<--encoding>
Specify encoding.
=item B<--aggressive>
Enable aggressive mode. Enables options --remove-blanks-starts --remove-blanks-end --remove-empty-text if they are not defined only.
Other options still keep their value.
=item B<--destructive>
Enable destructive mode. Enable options --remove-spaces-line-starts --remove-spaces-line-end if they are not defined only.
Enable also aggressive mode.
Other options still keep their value.
=item B<--insane>
Enable insane mode. Enables options --remove-cr-lf-everywhere --remove-spaces-everywhere if they are not defined only.
Enable also destructive mode and insane mode.
Other options still keep their value.
=item B<--help>
Print a brief help message and exits.
=back
=head1 DESCRIPTION
B<This program> will read the standard input and minify
=over 4
=item Remove all useless formatting between nodes.
=item Remove dtd.
=item Remove processing instructions
=item Remove comments.
=item Remove CDATA.
=back
This is the default and should be perceived as lossyless minification in term of semantic (but it's not completely if you consider these things as data).
If you want a full lossyless minification, just use --keep arguments.
In addition, you could be morte brutal and remove characters in the text nodes (sort of "cleaning") :
=over 4
=item Remove empty text nodes.
=item Remove starting blanks (carriage return, line feed, spaces...).
=item Remove ending blanks (carriage return, line feed, spaces...).
=item Remove carriage returns and line feed into text node everywhere.
=item Remove spaces text node everywhere.
=item Remove indentation in text node.
=item Remove invisible spaces in text node.
=back
=cut