NAME
App::ElasticSearch::Utilities::QueryString - CLI query string fixer
VERSION
version 6.0
SYNOPSIS
This class provides a pluggable architecture to expand query strings on the command-line into complex Elasticsearch queries.
ATTRIBUTES
context
Defaults to 'query', but can also be set to 'filter' so the elements will be added to the 'must' or 'filter' parameter.
search_path
An array reference of additional namespaces to search for loading the query string processing plugins. Example:
$qs->search_path([qw(My::Company::QueryString)]);
This will search:
App::ElasticSearch::Utilities::QueryString::*
My::Company::QueryString::*
For query processing plugins.
default_join
When fixing up the query string, if two tokens are found next to eachother missing a joining token, join using this token. Can be either AND
or OR
, and defaults to AND
.
plugins
Array reference of ordered query string processing plugins, lazily assembled.
METHODS
expand_query_string(@tokens)
This function takes a list of tokens, often from the command line via @ARGV. Uses a plugin infrastructure to allow customization.
Returns: App::ElasticSearch::Utilities::Query object
TOKENS
The token expansion plugins can return undefined, which is basically a noop on the token. The plugin can return a hash reference, which marks that token as handled and no other plugins receive that token. The hash reference may contain:
- query_string
-
This is the rewritten bits that will be reassembled in to the final query string.
- condition
-
This is usually a hash reference representing the condition going into the bool query. For instance:
{ terms => { field => [qw(alice bob charlie)] } }
Or
{ prefix => { user_agent => 'Go ' } }
These conditions will wind up in the must or must_not section of the bool query depending on the state of the the invert flag.
- invert
-
This is used by the bareword "not" to track whether the token invoked a flip from the must to the must_not state. After each token is processed, if it didn't set this flag, the flag is reset.
- dangles
-
This is used for bare words like "not", "or", and "and" to denote that these terms cannot dangle from the beginning or end of the query_string. This allows the final pass of the query_string builder to strip these words to prevent syntax errors.
Extended Syntax
The search string is pre-analyzed before being sent to ElasticSearch. The following plugins work to manipulate the query string and provide richer, more complete syntax for CLI applications.
App::ElasticSearch::Utilities::Barewords
The following barewords are transformed:
or => OR
and => AND
not => NOT
App::ElasticSearch::Utilities::QueryString::IP
If a field is an IP address uses CIDR Notation, it's expanded to a range query.
src_ip:10.0/8 => src_ip:[10.0.0.0 TO 10.255.255.255]
App::ElasticSearch::Utilities::Range
This plugin translates some special comparison operators so you don't need to remember them anymore.
Example:
price:<100
Will translate into a:
{ range: { price: { lt: 100 } } }
And:
price:>50,<100
Will translate to:
{ range: { price: { gt: 50, lt: 100 } } }
Supported Operators
gt via >, gte via >=, lt via <, lte via <=
App::ElasticSearch::Utilities::Underscored
This plugin translates some special underscore surrounded tokens into the Elasticsearch Query DSL.
Implemented:
_prefix_
Example query string:
_prefix_:useragent:'Go '
Translates into:
{ prefix => { useragent => 'Go ' } }
App::ElasticSearch::Utilities::QueryString::FileExpansion
If the match ends in .dat, .txt, or .csv, then we attempt to read a file with that name and OR the condition:
$ cat test.dat
50 1.2.3.4
40 1.2.3.5
30 1.2.3.6
20 1.2.3.7
Or
$ cat test.csv
50,1.2.3.4
40,1.2.3.5
30,1.2.3.6
20,1.2.3.7
Or
$ cat test.txt
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7
We can source that file:
src_ip:test.dat => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
This make it simple to use the --data-file output options and build queries based off previous queries. For .txt and .dat file, the delimiter for columns in the file must be either a tab, comma, or a semicolon. For files ending in .csv, Text::CSV_XS is used to accurate parsing of the file format.
You can also specify the column of the data file to use, the default being the last column or (-1). Columns are zero-based indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten as:
src_ip:test.dat[1]
or: src_ip:test.dat[-1]
This option will iterate through the whole file and unique the elements of the list. They will then be transformed into an appropriate terms query.
AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License