=
for
Pod::Coverage empty
=head1 NAME
File::Globstar::ListMatch - Match File Names Against List Of Patterns
=head1 SYNOPSIS
$matcher
= File::Globstar::ListMatch->new(
'.gitignore'
,
ignoreCase
=> 1,
isExclude
=> 1);
$matcher
= File::Globstar::ListMatch->new(STDIN,
ignoreCase
=> 0);
$matcher
= File::Globstar::ListMatch->new([
'src/**/*.o'
,
'.*'
,
'!.gitignore'
],
filename
=>
'exclude.txt'
);
$patterns
= <<EOF;
src/**/*.o
.*
.gitignore
EOF
$matcher
= File::Globstar::ListMatch->new(\
$pattern
);
$filename
=
'path/to/hello.o'
;
if
(
$matcher
->match(
$filename
)) {
print
"Ignore '$filename'.\n"
;
}
=head1 DESCRIPTION
Files containing lists of files are very common as arguments to command-line
options such as
"--ignore"
,
"--exclude"
or
"--include"
. A well-known example
This module implements the same functionality in Perl.
While the module will normally be used
for
matching against filenames,
no
filesystem operations are done. Only strings are compared. The reason
for
this is that it should be possible to match names of deleted files as
well as existing files.
When you
read
the documentation below, you may come to the conclusion that
using this module is really complicated
given
that there are so many special
rules about slashes, exclamation marks, asterisks/stars and so on. In fact,
these rules are only hard to describe precisely. But they pretty much
do
exactly what you expect from them and what you are used to from the moment
that you had first entered a
"*"
in a terminal window.
That being said,
do
not spend too much
time
thinking what the difference
between
"/src/backup"
and
"src/backup"
in a F<.gitignore> file is. Just
try
it out and leave it as it is, once it works. Only
if
you want to know
why
"/src/backup"
and
"src/backup"
do
exactly the same, you can
use
this
documentation as a reference.
Similarily, the description of the
"ignore mode"
and
"include mode"
below
sounds pretty convoluted. But the mere presence of a the word
"ignore"
in the
name of the F<.gitignore> file switches your brain to
"ignore mode"
and the
subtle difference between the two modes just reflect what your brain expects.
=head2 INPUT FILE FORMAT
Unless you pass a reference to an array of patterns as input, comments and
blank lines are discarded.
=over 4
=item B<Comments>
Comments are lines that start
with
a hash-sign. You can escape hash-signs that
are part of the pattern
with
a backslash:
"\#"
.
=item B<Blank Lines>
Blank lines are empty lines or lines consisting of whitespace only. Whitespace
is a sequence of US-ASCII whitespace characters: horizontal tabs
(ASCII 9), line feeds (ASCII 10), vertical tabs (ASCII 11), form feeds
(ASCII 12), carriage returns (ASCII 13), and space (ASCII 32). Other
characters
with
the Unicode property
"WSpace=Y"
are not considered whitespace.
=back
Leading whitespace (see above) is I<not> removed! Likewise, whitespace between
the leading negation
"!"
and the pattern is not removed! On the other hand,
in order to ignore a file named
" "
you have to backslash-escape
the first space character (
"\ "
). Otherwise, the pattern will be interpreted
as a blank line and ignored. This is consistent
with
the behavior of Git.
=head2 PATTERNS
Patterns undergo a little preprocessing:
=over 4
=item B<A leading exclamation is stripped off>
But the pattern is now negated. It produces a match
for
all files that
do
I<not> match the pattern. A literal exclamation mark can be escaped
with
a backslash.
=item B<A leading slash is stripped off>
But the pattern must match the entire significant
"path"
(actually string).
For example
"/foobar"
matches
for
"/foobar"
but not
for
"/sub/sub/foobar"
.
=item B<A trailing slash is stripped off>
But the pattern can only match
"directories"
. This is why the method
B<match()> below
has
an optional second argument that lets you specify,
whether the string to be matched is considered a directory or not.
=back
=head2 MATCHING ALGORITHM
The string (normally a filename) passed to the matcher is compared
subsequently to all patterns. If none of the patterns match or
if
the
last
match was against a negated pattern, the overall result is false. Otherwise it
is true.
If a patterrn contains a slash (
"/"
), the match is performed against the full
path name, otherwise just against the basename of the file
with
a leading
"directory"
part stripped off.
If a pattern starts
with
a leading slash, that slash is stripped off
for
the purpose of comparison but the string must match relatively to the
base
"directory"
.
The semantics of a match are the same as
for
B<fnmatchstar()> in
L<File::Globstar>. But keep in mind that a leading exclamation mark (
for
negation), a leading
"directory"
part, a leading slash, or a trailing slash
may be stripped off according to the rules outline above.
=head1 METHODS
=over 4
=item B<new INPUT[,
%OPTIONS
]>
Creates a new B<File::Globstar::ListMatch> object. B<INPUT> can be:
=over 8
=item B<FILE>
B<FILE> can be a filename, an
open
file handle, or a reference to an
open
file handle.
=item B<STRINGREF>
B<STRINGREF> is a reference to a string containing the patterns.
=item B<ARRAYREF>
You can also pass a list of patterns as an array reference. Leading
exclamation marks (
"!"
) followed by possible whitespace
for
negating patterns
are stripped off, but otherwise all patterns are taken as is, even blank
ones and patterns starting
with
a hash-sign (
"#"
).
=back
The input source can be followed by optional named arguments passed as
key-value pairs. Currently recogized:
=over 4
=item B<ignoreCase =E<gt> 0|1|
undef
>
Controls, whether to ignore case,
when
matching. A true value will cause
case to be ignored. The
default
value is false, so that matches are done
in a case-sensitive manner. This is an appropriate setting also
for
both
case-sensitive and case-preserving file systems.
=item B<filename =E<gt> FILENAME>
Use B<FILENAME> in messages
for
I/O errors.
=back
=item B<match STRING[, IS_DIRECTORY]>
Returns true
if
B<STRING> matches. If you pass a true value
for
the
optional argument B<IS_DIRECTORY>, B<STRING> is considered to be the name
of a directory.
A leading slash in B<STRING> is stripped off and is ignored.
A trailing slash is also ignored but B<STRING> is then considered to
be a directory name and B<IS_DIRECTORY> is ignored.
When comparing against a pattern that contains a slash (except
for
a trailing
slash), the full string is taken into account. Otherwise, only the part
after
the
last
(non-trailing) slash is taken into account.
Note that B<match()> assumes that you are excluding or ignoring files. This
I<exclude> mode implies that certain negations
do
not make sense and are
ignored. Thake this example:
/node_modules
!/node_modules/foobar
The second line gets ignored. In exclude mode it is assumed that you
recurse a directory calling B<match()>
for
every file you visit. If a file
matches it is always ignored. And
if
it is a directory, it is not only
ignored but the recursion also stops here.
The exact rule is: You cannot re-include a file by negating a pattern
if
one
of the file's parent directories would be excluded.
This is the behaviour of git,
when
evaluating ignore lists. If you want to
avoid that behaviour, see below
for
B<matchInclude()>.
On the other hand, the following is possible:
docs/_*
!docs/_posts
The only
"parent directory"
for
the negation is F<docs> and that is not
excluded. This is different to this example:
docs/_*
!docs/_posts/recent
The
"parent directories"
are F<docs> and F<docs/_posts> and F<docs/_posts>
matches against F<docs/_*>. The negation is therefore invalid and ignored.
=item B<matchExclude STRING[, IS_DIRECTORY]>
This is an alias
for
B<match()>
=item B<matchInclude STRING[, IS_DIRECTORY]>
Does the same as B<match()> but all negations are valid.
The metaphor
for
include mode is different from exclude mode that is
assumed by B<match()> resp. B<matchExclude()>. In include mode you
interpret all patterns as globbing patterns. A positive, non-negated
pattern causes the matching files to be added to the result list. A
negated pattern will remove the matching files from the result list. The
operation happens recursively
if
a directory gets added or removed.
Take almost the same example as
for
B<match()> above:
docs/_*
!docs/_posts/archive
Imagine line 1 would produce the following list:
=over 4
=item * F<docs/_views/>
=item * F<docs/_views/main.html>
=item * F<docs/_views/head/>
=item * F<docs/_views/head/meta.html>
=item * F<docs/_posts/new/>
=item * F<docs/_posts/new/post4321.html>
=item * F<docs/_posts/archive/>
=item * F<docs/_posts/archive/post1.html>
=item * F<docs/_posts/archive/post2.html>
Line 2 would then kickout the
last
three matches from the result list.
=back
=item B<patterns>
Returns the patterns as a list of compiled regular expressions. This is
probably only useful
for
testing the module itself.
=back
=head1 BUGS AND CAVEATS
The module interprets backslashes only
for
escaping. It does not assume any
path separator semantics
for
them. This should normally not be a problem.
Git ignores all hidden files by
default
. If you want the same behavior
for
B<File::Globstar::ListMatch>, put a
".*"
in front of the patterns.
=head1 COPYRIGHT
Copyright (C) 2016-2023 Guido Flohr <guido.flohr
@cantanea
.com>,
all rights reserved.
=head1 SEE ALSO
L<File::Globstar>, L<File::Glob>,
glob
(3),
glob
(7), fnmatch(3),
glob
(1),
perl(1)