From Code to Community: Sponsoring The Perl and Raku Conference 2025 Learn more

#!/usr/bin/env perl
use XML::Minifier 'minify';
use strict;
use Pod::Usage qw(pod2usage);
my %opt = ();
my $opt_help;
GetOptions (
"expand-entities" => \$opt{expand_entities},
"process-xincludes" => \$opt{process_xincludes},
"remove-blanks-start" => \$opt{remove_blanks_start},
"remove-blanks-end" => \$opt{remove_blanks_end},
"remove-spaces-line-start" => \$opt{remove_spaces_line_start},
"remove-spaces-line-end" => \$opt{remove_spaces_line_end},
"remove-indent" => \$opt{remove_spaces_line_start},
"remove-empty-text" => \$opt{remove_empty_text},
"remove-cr-lf-everywhere" => \$opt{remove_cr_lf_everywhere},
"remove-spaces-everywhere" => \$opt{remove_spaces_everywhere},
"keep-comments" => \$opt{keep_comments},
"keep-cdata" => \$opt{keep_cdatas},
"keep-pi" => \$opt{keep_pi},
"keep-dtd" => \$opt{keep_dtd},
"ignore-dtd" => \$opt{ignore_dtd},
"no-prolog" => \$opt{no_prolog},
"version=s" => \$opt{version},
"encoding=s" => \$opt{encoding},
"aggressive" => \$opt{aggressive},
"agressive" => \$opt{aggressive},
"destructive" => \$opt{destructive},
"insane" => \$opt{insane},
"help" => \$opt_help
) or die("Error in command line arguments (maybe \"$0 --help\" could help ?)\n");
($opt_help) and pod2usage(1);
my $string;
while (<>) {
$string .= $_;
}
print minify($string, %opt);
__END__
=head1 NAME
xml-minifier - Minify XML files
=head1 SYNOPSIS
xml-minifier file.xml
OR
cat file.xml | xml-minifier
Options:
--expand-entities expand entities
--process-xincludes process xincludes
--remove-blanks-start remove blanks before text
--remove-blanks-end remove blanks after text
--remove-spaces-line-start remove spaces/tabs before text (each line)
--remove-spaces-line-end remove spaces/tabs after text (each line)
--remove-indent remove spaces/tabs before text (each line
--remove-empty-text remove (pseudo) empty text
--remove-cr-lf-everywhere remove cr and lf everywhere
--keep-comments keep comments
--keep-cdata keep cdata
--keep-pi keep processing instructions
--keep-dtd keep dtd
--ignore-dtd ignore dtd
--no-prolog remove prolog (version and encoding)
--version specify version for the xml
--encoding specify encoding for the xml
--aggressive enable aggressive mode
--destructive enable aggressive mode
--insane enable aggressive mode
--help brief help message
=head1 OPTIONS
=over 4
=item B<--expand-entities>
Expand entities. An entity is like &foo;
=item B<--process-xincludes>
Process xicnludes. An xinclude is like <xi:include href="inc.xml"/>
=item B<--remove-blanks-start>
Remove blanks (spaces, carriage return, line feed...) in front of text nodes.
For instance <tag> foo bar</tag> will become <tag>foo bar</tag>
Agressive and therefore lossy compression.
=item B<--remove-blanks-end>
Remove blanks (spaces, carriage return, line feed...) at the end of text nodes.
For instance <tag>foo bar </tag> will become <tag>foo bar</tag>
Agressive and therefore lossy compression.
=item B<--remove-spaces-line-start>
Remove spaces and tabs at the start of each line of text nodes.
Agressive and therefore lossy compression.
=item B<--remove-spaces-line-end>
Remove spaces and tabs at the end of each line of text nodes.
Agressive and therefore lossy compression.
=item B<--remove-indent>
Remove spaces and tabs at the start of each line of text nodes.
It is actually an alias of B<--remove-spaces-line-start>
Agressive and therefore lossy compression.
=item B<--remove-empty-text>
Remove (pseudo) empty text nodes (spaces, carriage return, line feed...).
=item B<--remove-cr-lf-everywhere>
Remove carriage returns and line feed everywhere (inside text !).
For instance <tag>foo\nbar</tag> will become <tag>foobar</tag>
Very aggressive and therefore lossy compression.
=item B<--keep-comments>
Keep comments, by default they are removed. A comment is like <!-- comment -->
=item B<--keep-cdata>
Keep cdata, by default they are removed. A CDATA is like <![CDATA[ my cdata ]]>
=item B<--keep-pi>
Keep processing instructions. A processing instruction is like <?xml-stylesheet href="style.css"/>
=item B<--keep-dtd>
Keep DTD.
=item B<--ignore-dtd>
Do not read DTD. The minifier reads the DTD to get informations about meaningful blanks.
=item B<--no-version>
Do not put any version.
=item B<--version>
Specify version.
=item B<--encoding>
Specify encoding.
=item B<--aggressive>
Enable aggressive mode. Enables options --remove-blanks-starts --remove-blanks-end --remove-empty-text if they are not defined only.
Other options still keep their value.
=item B<--destructive>
Enable destructive mode. Enable options --remove-spaces-line-starts --remove-spaces-line-end if they are not defined only.
Enable also aggressive mode.
Other options still keep their value.
=item B<--insane>
Enable insane mode. Enables options --remove-cr-lf-everywhere --remove-spaces-everywhere if they are not defined only.
Enable also destructive mode and insane mode.
Other options still keep their value.
=item B<--help>
Print a brief help message and exits.
=back
=head1 DESCRIPTION
B<This program> will read the standard input and minify
=over 4
=item Remove all useless formatting between nodes.
=item Remove dtd.
=item Remove processing instructions
=item Remove comments.
=item Remove CDATA.
=back
This is the default and should be perceived as lossyless minification in term of semantic (but it's not completely if you consider these things as data).
If you want a full lossyless minification, just use --keep arguments.
In addition, you could be morte brutal and remove characters in the text nodes (sort of "cleaning") :
=over 4
=item Remove empty text nodes.
=item Remove starting blanks (carriage return, line feed, spaces...).
=item Remove ending blanks (carriage return, line feed, spaces...).
=item Remove carriage returns and line feed into text node everywhere.
=item Remove spaces text node everywhere.
=item Remove indentation in text node.
=item Remove invisible spaces in text node.
=back
=cut