NAME
Apache2::Expression - Apache2 Expressions
SYNOPSIS
use Apache2::Expression;
my $exp = Apache2::Expression->new( legacy => 1 );
my $hash = $exp->parse;
VERSION
v0.1.1
DESCRIPTION
Apache2::Expression is used to parse Apache2 expression like the one found in SSI (Server Side Includes).
METHODS
parse
This method takes a string representing an Apache2 expression as argument, and returns an hash containing the details of the elements that make the expression.
It takes an optional hash of parameters, as follows :
legacy-
When this is provided with a positive value, this will enable Apache2 legacy regular expression. See Regexp::Common::Apache2 for more information on what this means.
trunk-
When this is provided with a positive value, this will enable Apache2 experimental and advanced expressions. See Regexp::Common::Apache2 for more information on what this means.
For example :
$HTTP_COOKIE = /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/
would return :
{
elements => [
{
elements => [
{
elements => [
{
elements => [],
name => "HTTP_COOKIE",
raw => "\$HTTP_COOKIE",
re => { variable => "\$HTTP_COOKIE", varname => "HTTP_COOKIE" },
subtype => "variable",
type => "variable",
},
{
elements => [],
flags => undef,
pattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
raw => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
regex => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
regpattern => "lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?",
regsep => "/",
},
sep => "/",
type => "regex",
},
],
op => "=",
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_in_regexp_legacy => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
comp_regexp_op => "=",
comp_word => "\$HTTP_COOKIE",
},
regexp => "/lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
subtype => "regexp",
type => "comp",
word => "\$HTTP_COOKIE",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
re => {
cond => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
cond_comp => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
},
subtype => "comp",
type => "cond",
},
],
raw => "\$HTTP_COOKIE = /lang\\%22\\%3A\\%22([a-zA-Z]+\\-[a-zA-Z]+)\\%22\\%7D;?/",
}
The properties returned in the hash are:
elements-
An array reference of sub elements contained which provides granular definition.
Whatever the
elementsarray reference contains is defined in one of the types below. name-
The name of the element. For example if this is a function, this would be the function name, or if this is a variable, this would be the variable name without it leading dollar or percent sign nor its possible surrounding accolades.
raw-
The raw string, or chunk of string that was processed.
re-
This contains the hash of capture groups as provided by Regexp::Common::Apache2. It is made available to enable finer and granular control.
regexpsubtype-
A sub type that provide more information about the type of expression processed.
This can be any of the
typementioned below plus the following ones : binary (for comparison), list (for word to list comparison), negative, parenthesis, rebackref, regexp, unary (for comparison)See below for possible combinations.
type-
The main type matching the Apache2 expression. This can be comp, cond, digits, function, integercomp, quote (for quoted words), regex, stringcomp, listfunc, variable, word
See below for possible combinations.
word-
If this is a word, this contains the word. In th example above,
$HTTP_COOKIEwould be the word used in the regular expression comparison.
parse_args
Given a string that represents typically a function arguments, this method will use PPI to parse it and returns an array of parameters as string.
Parsing a function argument is non-trivial as it can contain function call within function call.
COMBINATIONS
- comp
-
Type: comp
Possible sub types:
binary-
When a binary operator is used, such as :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatchExample :
192.168.2.10 -ipmatch 192.168.2/24192.168.2.10would be captured in propertyworda,ipmatch(without leading dash) would be captured in propertyopand192.168.2/24would be captured in propertywordb.The array reference in property
elementswill contain more information onwordaandwordbAlso the details of elements for
wordacan be accessed with propertyworda_defas an array reference and likewise forwordbwithwordb_def. function-
This contains the function name and arguments when the lefthand side word is compared to a list function.
For example :
192.168.1.10 in split( /\,/, $ip_list )In this example,
192.168.1.10would be captured inwordandsplit( /\,/, $ip_list )would be captured infunctionwith the array referenceelementscontaining more information about the word and the function.Also the details of elements for
wordcan be accessed with propertyword_defas an array reference and likewise forfunctionwithfunction_def. list-
Is true when the comparison is of a word on the lefthand side to a list of words, such as :
%{SOME_VALUE} in {"John", "Peter", "Paul"}In this example,
%{SOME_VALUE}would be captured in propertywordand"John", "Peter", "Paul"(without enclosing accolades or possible spaces after and before them) would be captured in propertylistThe array reference
elementswill possibly contain more information onwordand each element inlistAlso the details of elements for
wordcan be accessed with propertyword_defas an array reference and likewise forlistwithlist_def. regexp-
When the lefthand side word is being compared to a regular expression.
For example :
%{HTTP_COOKIE} =~ /lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/In this example,
%{HTTP_COOKIE}would be captured in propertywordand/lang\%22\%3A\%22([a-zA-Z]+\-[a-zA-Z]+)\%22\%7D;?/would be captured in propertyregexpand=~would be captured in propertyopCheck the array reference in property
elementsfor more details about thewordand the regular expression inregexp.Also the details of elements for
wordcan be accessed with propertyword_defas an array reference and likewise forregexpwithregexp_def. unary-
When the following operator is used against a word :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -RFor example:
-A /some/uri.html # (same as -U) -d /some/folder # file is a directory -e /some/folder/file.txt # file exists -f /some/folder/file.txt # file is a regular file -F /some/folder/file.txt # file is a regular file and is accessible to all (Apache2 does a sub query to check) -h /some/folder/link.txt # true if file is a symbolic link -n %{QUERY_STRING} # true if string is not empty (opposite of -z) -s /some/folder/file.txt # true if file is not empty -L /some/folder/link.txt # true if file is a symbolic link (same as -h) -R 192.168.1.1/24 # remote ip match this ip block; same as %{REMOTE_ADDR} -ipmatch 192.168.1.1/24 -T %{HTTPS} # false if string is empty, "0", "off", "false", or "no" (case insensitive). True otherwise. -U /some/uri.html # check if the uri is accessible to all (Apache2 does a sub query to check) -z %{QUERY_STRING} # true if string is empty (opposite of -n)In this example
-e /some/folder/file.txt,e(without leading dash) would be captured inopand/some/folder/file.txtwould be captured inwordCheck the array reference in property
elementsfor more information about the word inwordAlso the details of elements for
wordcan be accessed with propertyword_defas an array reference.See here for more information: Regexp::Common::Apache2::comp
Available properties:
op-
Contains the operator used. See Regexp::Common::Apache2::comp, "stringcomp" in Regexp::Common::Apache2 and "integercomp" in Regexp::Common::Apache2
This may be for unary operators :
-d, -e, -f, -s, -L, -h, -F, -U, -A, -n, -z, -T, -RFor binary operators :
==, =, !=, <, <=, >, >=, -ipmatch, -strmatch, -strcmatch, -fnmatchFor integer comparison :
-eq, -ne, -lt, -le, -gt, -geFor string comparison :
==, !=, <, <=, >, >=In all the possible operators above,
opcontains the value, but without the leading dash, if any. word-
The word being compared.
worda-
The first word being compared, and on the left of the operator. For example :
12 -ne 10 wordb-
The second word, being compared to, and on the right of the operator.
See "comp" in Regexp::Common::Apache2 for more information.
- cond
-
Type: cond
Possible sub types:
and-
When the condition is an ANDed expression such as :
$ap_true && $ap_falseIn this case,
$ap_truewould be captured in propertyexpr1and$ap_falsewould be captured in propertyexpr2Also the details of elements for the variable can be accessed with property
and_defas an array reference andand_expr1_defandand_expr2_def comp-
Contains the expression when the condition is actually a comparison.
This will recurse and you can see more information in the array reference in the property
elements. For more information on what it will contain, check the comp type. cond-
Default sub type
negative-
When the condition is negative, ie prefixed by an exclamation mark.
For example :
!-z /some/folder/file.txtYou need to check for the details in array reference contained in property
elementsAlso the details of elements for the variable can be accessed with property
negative_defas an array reference. or-
When the condition is an ORed expression such as :
$ap_true || $ap_falseIn this case,
$ap_truewould be captured in propertyexpr1and$ap_falsewould be captured in propertyexpr2Also the details of elements for the variable can be accessed with property
and_defas an array reference andand_expr1_defandand_expr2_def parenthesis-
When the condition is embedded within parenthesis
You need to check the array reference in property
elementsfor information about the embedded condition.Also the details of elements for the variable can be accessed with property
parenthesis_defas an array reference. variable-
Contains the expression when the condition is based on a variable, such as :
%{REQUEST_URI}Check the array reference in property
elementsfor more details about the variable, especially the propertynamewhich would contain the name of the variable; in this case :REQUEST_URIAlso the details of elements for the variable can be accessed with property
variable_defas an array reference.
Available properties:
args-
Function arguments. See the content of the
elementsarray reference for more breakdown on the arguments provided. is_negative-
If the condition is negative, this value is true
name-
Function name
See "cond" in Regexp::Common::Apache2 for more information.
- function
-
Type: function
Possible sub types: none
Available properties:
args-
Function arguments. See the content of the
elementsarray reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
args_defas an array reference. name-
Function name
See "function" in Regexp::Common::Apache2 for more information.
- integercomp
-
Type: integercomp
Possible sub types: none
Available properties:
op-
Contains the operator used. See "integercomp" in Regexp::Common::Apache2
worda-
The first word being compared, and on the left of the operator. For example :
12 -ne 10Also the details of elements for
wordacan be accessed with propertyworda_defas an array reference. wordb-
The second word, being compared to, and on the right of the operator.
Also the details of elements for
wordbcan be accessed with propertywordb_defas an array reference.
See "integercomp" in Regexp::Common::Apache2 for more information.
- join
-
Type: join
Possible sub types: none
Available properties:
list-
The list of strings to be joined. See the content of the
elementsarray reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
list_defas an array reference. word-
The word used to join the list. This parameter is optional.
Details for the word parameter, if any, can be found in the
elementsarray reference or can be accessed with theword_defproperty.
For example :
join({"John Paul Doe"}, ', ') # or join({"John", "Paul", "Doe"}, ', ') # or just join({"John", "Paul", "Doe"})See "join" in Regexp::Common::Apache2 for more information.
- listfunc
-
Type: listfunc
Possible sub types: none
Available properties:
args-
Function arguments. See the content of the
elementsarray reference for more breakdown on the arguments provided.Also the details of elements for those args can be accessed with property
args_defas an array reference. name-
Function name
See "listfunc" in Regexp::Common::Apache2 for more information.
- regex
-
Type: regex
Possible sub types: none
Available properties:
flags-
Example:
mgis pattern-
Regular expression pattern, excluding enclosing separators.
sep-
Type of separators used. It can be: /, #, $, %, ^, |, ?, !, ', ", ",", ";", ":", ".", _, and -
See "regex" in Regexp::Common::Apache2 for more information.
- stringcomp
-
Type: stringcomp
Possible sub types: none
Available properties:
op-
COntains the operator used. See "stringcomp" in Regexp::Common::Apache2
worda-
The first word being compared, and on the left of the operator. For example :
12 -ne 10Also the details of elements for
wordacan be accessed with propertyworda_defas an array reference. wordb-
The second word, being compared to, and on the right of the operator.
Also the details of elements for
wordbcan be accessed with propertywordb_defas an array reference.
See "stringcomp" in Regexp::Common::Apache2 for more information.
- variable
-
Type: variable
Possible sub types:
function-
%{md5:"some arguments"} rebackref-
This is a regular expression back reference, such as
$1,$2, etc. up to 9 variable-
%{REQUEST_URI} # or by enabling the legacy expressions ${REQUEST_URI}
Available properties:
args-
Function arguments. See the content of the
elementsarray reference for more breakdown on the arguments provided. name-
Function name, or variable name.
value-
The regular expression back reference value, such as
1,2, etc
See "variable" in Regexp::Common::Apache2 for more information.
- word
-
Type: word
Possible sub types:
digits-
When the word contains one or more digits.
dotted-
When the word contains words sepsrated by dots, such as
192.168.1.10 function-
When the word is a function.
parens-
When the word is surrounded by parenthesis
quote-
When the word is surrounded by single or double quotes
rebackref-
When the word is a regular expression back reference such as
$1,$2, etc up to 9. regex-
This is an extension I added to make work some function such as
split( /\w+/, $ip_list)Without it, the regular expression would not be recognised as the Apache BNF stands.
variable-
When the word is a variable. For example :
%{REQUEST_URI}, and it can also be a variable like${REQUEST_URIif the legacy mode is enabled.
Available properties:
flags-
The regular expression flags used, such as
mgis parens-
Contains an array reference of the open and close parenthesis, such as:
["(", ")"] pattern-
The regular expression pattern
quote-
Contains the type of quote used if the sub type is
quote regex-
Contains the regular expression
sep-
The separator used in the regular expression, such as
/ value-
The value of the digits if the sub type is
digitsorrebackref word-
The word enclosed in quotes
See "variable" in Regexp::Common::Apache2 for more information.
CAVEAT
This module supports well Apache2 expressions. However, some expression are difficult to process. For example:
Expressions with functions not using enclosing parenthesis:
%{REMOTE_ADDR} -in split s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName')
Instead, use:
%{REMOTE_ADDR} -in split(s/.*?IP Address:([^,]+)/$1/, PeerExtList('subjectAltName'))
There is no mechanism yet to prevent infinite recursion. This needs to be implemented.
CHANGES & CONTRIBUTIONS
Feel free to reach out to the author for possible corrections, improvements, or suggestions.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
Apache2::SSI, Regexp::Common::Apache2, https://httpd.apache.org/docs/current/expr.html
COPYRIGHT & LICENSE
Copyright (c) 2020 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.