NAME
Net::API::CPAN::Filter - Meta CPAN API
SYNOPSIS
use Net::API::CPAN::Filter;
my $this = Net::API::CPAN::Filter->new(
query => {
regexp => { name => 'HTTP.*' },
},
) || die( Net::API::CPAN::Filter->error, "\n" );
VERSION
v0.1.0
DESCRIPTION
This class is designed to facilitate the forming of an Elastic Search query and store its various components as an object of this class, so it can possibly be re-used or shared.
You can pass arguments to the methods "aggs", "fields", "filter", "from", "match_all", "query", "size", "sort", "source" to affect the production of the query.
Alternatively, you can pass an hash reference of a fully formed Elastic Search query directly to "es" to take precedence over all the other methods.
Calling "as_hash" will collate all the components and cache the result. If any information is changed using any of the methods in this class, it will remove the cached hash produced by "as_hash"
You can get a resulting JSON
by calling "as_json", which in turn, calls "as_hash"
As far as it is documented in the API documentation, Meta CPAN uses version 2.4
of Elastic Search, and the methods documentation herein reflect that.
METHODS
aggregations
This is an alias for "aggs"
aggs
Sets or gets an hash reference of query aggregations (post filter). It returns an hash object, or undef
, if nothing was set.
Example from Elastic Search documentation
{
aggs => {
models => {
terms => { field => "model" },
},
},
query => {
bool => {
filter => [
{
term => { color => "red" },
},
{
term => { brand => "gucci" },
},
],
},
},
}
See also Elastic Search documentation, and here
apply
Provided with an hash or hash reference of parameters and this will apply each of the value to the method matching its corresponding key if that method exists.
It returns the current object for chaining.
as_hash
Read-only. Returns the various components of the query as an hash reference.
The resulting hash of data is cached so you can call it multiple time without additional overhead. Any change passed to any methods here will reset that cache.
as_json
my $json = $filter->as_json;
my $json_in_utf8 = $filter->as_json( encoding => 'utf-8' );
Read-only. Returns the various components of the query as JSON
data encoded in Perl internal utf-8 encoding.
If an hash or hash reference of options is provided with a property encoding set to utf-8
or utf8
, then the JSON data returned will be encoded in utf-8
es
This takes an hash reference of Elastic Search query parameters.
See "ELASTIC SEARCH QUERY" for a brief overview of valid parameters.
Otherwise you are encouraged to call "query" which will format the Elastic Search query for you.
Returns an hash object
fields
Sets or gets an array of fields onto which the query will be applied.
It returns an array object
{
query => {
terms => { name => "Japan Folklore" }
},
fields => [qw( name abstract distribution )],
}
Field names can also contain wildcard:
{
query => {
terms => { name => "Japan Folklore" }
},
fields => [qw( name abstract dist* )],
}
Importance of some fields can also be boosted using the caret notation ^
{
query => {
terms => { name => "Japan Folklore" }
},
fields => [qw( name^3 abstract dist* )],
}
Here, the field name
is treated as 3 times important as the others.
See Elastic Search documentation for more information.
filter
Sets or gets an hash of filter to affect the Elastic Search query result.
{
query => {
bool => {
must => [
{ match => { name => "Folklore-Japan-v1.2.3" }},
{ match => { abstract => "Japan Folklore Object Class" }}
],
filter => [
{ term => { status => "latest" }},
{ range => { date => { gte => "2023-07-01" }}}
]
}
}
}
It returns an hash object
from
Sets or gets a positive integer to return the desired results page. It returns the current value, if any, as a number object, or undef
if there is no value set.
{
from => 0,
query => {
term => { user => "kimchy" },
},
size => 10,
}
As per the Elastic Search documentation, "[p]agination of results can be done by using the from
and size
parameters. The from
parameter defines the offset from the first result you want to fetch. The size
parameter allows you to configure the maximum amount of hits to be returned".
For example, on a size of 10
elements per page, the first page would start at offset a.k.a from
0
and end at offset 9
and page 2 at from
10
till 19
, thus to get the second page you would set the value for from
to 10
See also the more efficient scroll approach to pagination of query results.
Keep in mind this is different from the from
option supported in some endpoints of the MetaCPAN API, which would typically starts at 1 instead of 0.
See Elastic Search documentation for more information.
match_all
# Enabled
$filter->match_all(1);
# Disabled (default)
$filter->match_all(0);
# or
$filter->match_all(undef);
# or with explicit score
$filter->match_all(1.12);
Boolean. If true, this will match all documents by Elastic Search with an identical score of 1.0
If the value provided is a number other than 1
or 0
, then it will be interpreted as an explicit score to use instead of the default 1.0
For example:
$filter->match_all(1.12)
would produce:
{ match_all => { boost => 1.2 }}
See Elastic Search for more information.
name
Sets or gets the optional query name. It always returns a scalar object
If set, it will be added to the filter
{
bool => {
filter => {
terms => { _name => "test", "name.last" => [qw( banon kimchy )] },
},
should => [
{
match => { "name.first" => { _name => "first", query => "shay" } },
},
{
match => { "name.last" => { _name => "last", query => "banon" } },
},
],
},
}
See Elastic Search documentation for more information.
query
This takes an hash reference of parameters and format the query in compliance with Elastic Search. You can provide directly the Elastic Search structure by calling "es" and providing it the proper hash reference of parameters.
Queries can be straightforward such as:
{ name => 'Taro Momo' }
or
{ pauseid => 'MOMOTARO' }
or using simple regular expression:
{ name => 'Taro *' }
This would find all the people whose name start with Taro
To produce more complex search queries, you can use some special keywords: all
, either
and not
, which correspond respectively to Elastic Search must
, should
, and must_not
and you can use the Elastic Search keywords interchangeably if you prefer. Thus:
{
either => [
{ name => 'John *' },
{ name => 'Peter *' },
]
}
is the same as:
{
should => [
{ name => 'John *' },
{ name => 'Peter *' },
]
}
and
{
all => [
{ name => 'John *' },
{ email => '*gmail.com' },
]
}
is the same as:
{
must => [
{ name => 'John *' },
{ email => '*gmail.com' },
]
}
Likewise
{
either => [
{ name => 'John *' },
{ name => 'Peter *' },
],
not => [
{ email => '*gmail.com' },
],
}
can also be expressed as:
{
should => [
{ name => 'John *' },
{ name => 'Peter *' },
],
must_not => [
{ email => '*gmail.com' },
],
}
reset
When called with some arguments, no matter their value, this will reset the cached hash reference computed by "as_hash"
It returns the current object for chaining.
size
Sets or gets a positive integer to set the maximum number of hits of query results. It returns the current value, if any, as a number object, or undef
if there is no value set.
See "from" for more information.
{
from => 0,
query => {
term => { user => "kimchy" },
},
size => 10,
}
See also the Elastic Search documentation
sort
Sets or gets an array reference of sort
parameter to affect the order of the query results.
It always returns an array object, which might be empty if nothing was specified.
{
query => {
term => { user => "kimchy" },
},
sort => [
{
post_date => { order => "asc" },
},
"user",
{ name => "desc" },
{ age => "desc" },
"_score",
],
}
The order option can have the following values:
asc
Sort in ascending order
desc
Sort in descending order
Elastic Search supports sorting by array or multi-valued fields. The mode option controls what array value is picked for sorting the document it belongs to. The mode option can have the following values:
{
query => {
term => { user => "kimchy" },
},
sort => [
{
price => {
order => "asc",
mode => "avg"
}
}
]
}
min
Pick the lowest value.
max
Pick the highest value.
sum
Use the sum of all values as sort value. Only applicable for number based array fields.
avg
Use the average of all values as sort value. Only applicable for number based array fields.
median
Use the median of all values as sort value. Only applicable for number based array fields.
You can also allow to sort by geo distance with _geo_distance
, such as:
{
query => {
term => { user => "kimchy" },
},
sort => [
{
_geo_distance => {
distance_type => "sloppy_arc",
mode => "min",
order => "asc",
"pin.location" => [-70, 40],
# or, as lat/long
# "pin.location" => {
# lat => 40,
# lon => -70
# },
# or, as string
# "pin.location" => "40,-70",
# or, as GeoHash
# "pin.location" => "drm3btev3e86",
unit => "km",
},
},
],
}
See also Elastic Search documentation
source
This sets or gets a string or an array reference of query source filtering.
It returns the current value, which may be undef
if nothing was specified.
By default Elastic Search returns the contents of the _source
field unless you have used the fields parameter or if the _source
field is disabled.
You can set it to false to disable it. A false value can be 0
, or an empty string ""
, but not undef
, which will disable this option entirely.
$filter->query({
user => 'kimchy'
});
$filter->source(0);
would produce the following hash returned by "as_hash":
{
_source => \0,
query => {
term => { user => "kimchy" },
},
}
For complete control, you can specify both include
and exclude
patterns:
$filter->query({
user => 'kimchy'
});
$filter->source({
exclude => ["*.description"],
include => ["obj1.*", "obj2.*"],
});
would produce the following hash returned by "as_hash":
{
_source => { exclude => ["*.description"], include => ["obj1.*", "obj2.*"] },
query => {
term => { user => "kimchy" },
},
}
See Elastic Search documentation for more information.
ELASTIC SEARCH QUERY
Query and Filter
Example:
The following will instruct Meta CPAN Elastic Search to find module release where all the following conditions are met:
The
name
field contains the wordFolklore-Japan-v1.2.3
.The
abstract
field containsJapan Folklore Object Class
.The
status
field contains the exact wordlatest
.The
date
field contains a date from 1 July 2023 onwards.
{
query => {
bool => {
must => [
{ match => { name => "Folklore-Japan-v1.2.3" }},
{ match => { abstract => "Japan Folklore Object Class" }}
],
filter => [
{ term => { status => "latest" }},
{ range => { date => { gte => "2023-07-01" }}}
]
}
}
}
Match all
{ match_all => {} }
or with an explicit score of 1.12
{ match_all => { boost => 1.12 } }
Match Query
{
match => { name => "Folklore-Japan-v1.2.3" }
}
or
{
match => {
name => {
query => "Folklore-Japan-v1.2.3",
# Defaults to 'or'
operator => 'and',
# The minimum number of optional 'should' clauses to match
minimum_should_match => 1,
# Set to true (\1 is translated as 'true' in JSON) to ignore exceptions caused by data-type mismatches
lenient => \1,
# Set the fuzziness value: 0, 1, 2 or AUTO
fuzziness => 'AUTO',
# True by default
fuzzy_transpositions => 1,
# 'none' or 'all'; defaults to 'none'
zero_terms_query => 'all',
cutoff_frequency => 0.001,
}
}
}
minimum_should_match
See Elastic Search documentation for valid value for
minimum_should_match
fuzziness
See Elastic Search documentation for valid values
zero_terms_query
See Elastic Search documentation for valid values
cutoff_frequency
See Elastic Search documentation for valid values
See also the Elastic Search documentation on match
query for more information on its valid parameters.
Match Phrase
{
match_phrase => {
abstract => "Japan Folklore Object Class",
}
}
which is the same as:
{
match => {
abstract => {
query => "Japan Folklore Object Class",
type => 'phrase',
}
}
}
Match Phrase Prefix
As per Elastic Search documentation, this is a poor-man’s autocomplete.
{
match_phrase_prefix => {
abstract => "Japan Folklore O"
}
}
It is designed to allow expansion on the last term of the query. The maximum number of expansion is controlled with the parameter max_expansions
{
match_phrase_prefix => {
abstract => {
query => "Japan Folklore O",
max_expansions => 10,
}
}
}
The documentation recommends the use of the completion suggester instead.
Multi Match Query
This performs a query on multiple fields:
{
multi_match => {
query => 'Japan Folklore',
fields => [qw( name abstract distribution )],
}
}
Field names can contain wildcard:
{
multi_match => {
query => 'Japan Folklore',
fields => [qw( name abstract dist* )],
}
}
Importance of some fields can also be boosted using the caret notation ^
{
multi_match => {
query => 'Japan Folklore',
fields => [qw( name^3 abstract dist* )],
}
}
Here, the field name
is treated as 3 times important as the others.
To affect the way the multiple match query is performed, you can set the type
value to best_fields
, most_fields
, cross_fields
, phrase
or phrase_prefix
{
multi_match => {
query => 'Japan Folklore',
fields => [qw( name^3 abstract dist* )],
type => 'best_fields',
}
}
It accepts the other same parameters as in the "Query and Filter" in match query
See Elastic Search documentation for more details.
Common Terms Query
As per Elastic Search documentation, the "common
terms query is a modern alternative to stopwords which improves the precision and recall of search results (by taking stopwords into account), without sacrificing performance."
{
common => {
abstract => {
query => 'Japan Folklore',
cutoff_frequency => 0.001,
}
}
}
The number of terms which should match can be controlled with the minimum_should_match
See the Elastic Search documentation for more information.
Query String Query
This leverages the parser in order to parse the content of the query.
{
query_string => {
default_field => "abstract",
query => "this AND that OR thus",
fields => [qw( abstract name )],
# Default is 'OR'
default_operator => 'AND',
# \1 (true) or \0 (false)
allow_leading_wildcard => \1,
# Default to true
lowercase_expanded_terms => \1,
# Default to true
enable_position_increments => \1,
# Defaults to 50
fuzzy_max_expansions => 10,
# Defaults to 'AUTO'
fuzziness => 'AUTO',
# Defaults to 0
fuzzy_prefix_length => 0,
# Defaults to 0
phrase_slop => 0,
# Defaults to 1.0
boost => 0,
# Defaults to true
analyze_wildcard => \1,
# Defaults to false
auto_generate_phrase_queries => \0,
# Defaults to 10000
max_determinized_states => 10000,
minimum_should_match => 2,
# Defaults to true,
lenient => \1,
locale => 'ROOT',
time_zone => 'Asia/Tokyo',
}
}
Wildcard searches can be run on individual terms, using ?
to replace a single character, and *
to replace zero or more characters:
qu?ck bro*
Regular expression can also be used:
As per the Elastic Search documentation, "regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/")":
name:/joh?n(ath[oa]n)/
Fuzziness, i.e., terms that are similar to, but not exactly like our search terms, can be expressed with the fuzziness operator:
quikc~ brwn~ foks~
An edit distance can be specified:
quikc~1
"fox quick"~5
A range can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max]
and exclusive ranges with curly brackets {min TO max}
.
All days in 2023:
date:[2023-01-01 TO 2023-12-31]
Numbers 1..5
count:[1 TO 5]
Tags between alpha
and omega
, excluding alpha
and omega
:
tag:{alpha TO omega}
Numbers from 10 upwards
count:[10 TO *]
Dates before 2023
date:{* TO 2023-01-01}
Numbers from 1 up to but not including 5
count:[1 TO 5}
Ranges with one side unbounded can use the following syntax:
age:>10
age:>=10
age:<10
age:<=10
age:(>=10 AND <20)
age:(+>=10 +<20)
But better to use a range query:
{
range => {
age => {
gte => 10,
lte => 20,
boost => 2.0
}
}
}
quick brown +fox -news
fox must be present
news must not be present
quick and brown are optional — their presence increases the relevance
(quick OR brown) AND fox
status:(active OR pending) title:(full text search)^2
See the Elastic Search documentation and the query string syntax for more information.
{
query_string => {
fields => [qw( abstract name )],
query => "this AND that"
}
}
is equivalent to:
{ query_string => { query => "(abstract:this OR name:this) AND (abstract:that OR name:that)" } }
"Simple wildcard can also be used to search "within" specific inner elements of the document":
{
query_string => {
fields => ["metadata.*"],
# or, even, to give 5 times more importance of sub elements of metadata
fields => [qw( abstract metadata.*^5 )],
query => "this AND that OR thus",
use_dis_max => \1,
}
}
Field names
Field names can contain query syntax, such as:
where the status
field contains latest
status:latest
where the abstract
field contains quick or brown. If you omit the OR operator the default operator will be used
abstract:(quick OR brown)
abstract:(quick brown)
where the author
field contains the exact phrase john smith
author:"John Smith"
where any of the fields metadata.abstract
, metadata.name
or metadata.date
contains quick
or brown
(note how we need to escape the *
with a backslash):
metadata.\*:(quick brown)
where the field resources.bugtracker
has no value (or is missing):
_missing_:resources.bugtracker
where the field resources.repository
has any non-null value:
_exists_:resources.repository
Simple Query String Query
See Elastic Search documentation for more information.
Those queries will never throw an exception and discard invalid parts.
{
simple_query_string => {
query => "\"fried eggs\" +(eggplant | potato) -frittata",
analyzer => "snowball",
fields => [qw( body^5 _all )],
default_operator => "and",
}
}
Supported special characters:
+
signifies AND operation|
signifies OR operation-
negates a single token"
wraps a number of tokens to signify a phrase for searching*
at the end of a term signifies a prefix query(
and)
signify precedence~N
after a word signifies edit distance (fuzziness)~N
after a phrase signifies slop amount
Flags can be specified to indicate which features to enable when parsing:
{
simple_query_string => {
query => "foo | bar + baz*",
flags => "OR|AND|PREFIX",
}
}
The available flags are: ALL
, NONE
, AND
, OR
, NOT
, PREFIX
, PHRASE
, PRECEDENCE
, ESCAPE
, WHITESPACE
, FUZZY
, NEAR
, and SLOP
Term Queries
{
term => { author => "John Doe" }
}
A boost
parameter can also be used to give a term more importance:
{
query => {
bool => {
should => [
{
term => {
status => {
value => "latest",
boost => 2.0
}
}
},
{
term => {
status => "deprecated"
}
}]
}
}
}
See Elastic Search documentation for more information.
Terms Query
{
constant_score => {
filter => {
terms => { pauseid => [qw( momotaro kintaro )]}
}
}
}
See Elastic Search documentation for more information.
Range Query
{
range => {
age => {
gte => 10,
lte => 20,
boost => 2.0,
}
}
}
The range
query accepts the following parameters:
gte
Greater-than or equal to
gt
Greater-than
lte
Less-than or equal to
lt
Less-than
boost
Sets the boost value of the query, defaults to
1.0
When using range on a date, ranges can be specified using Date Math:
+1h
Add one hour
-1d
Subtract one day
/d
Round down to the nearest day
Supported time units are: y
(year), M
(month), w
(week), d
(day), h
(hour), m
(minute), and s
(second).
For example:
now+1h
The current time plus one hour, with ms resolution.
now+1h+1m
The current time plus one hour plus one minute, with ms resolution.
now+1h/d
The current time plus one hour, rounded down to the nearest day.
2023-01-01||+1M/d
2023-01-01
plus one month, rounded down to the nearest day.
Date formats in range queries can be specified with the format
argument:
{
range => {
born => {
gte => "01/01/2022",
lte => "2023",
format => "dd/MM/yyyy||yyyy"
# With a time zone
# alternatively: Asia/Tokyo
time_zone => "+09:00",
}
}
}
See Elastic Search documentation for more information.
Exists Query
Search for values that are non-null.
{
exists => { field => "author" }
}
You can change the definition of what is null
with the null_value parameter
Equivalent to the missing query:
bool => {
must_not => {
exists => {
field => "author"
}
}
}
See Elastic Search documentation for more information.
Prefix Query
Search for documents that have fields containing terms with a specified prefix
.
For example, the author
field that contains a term starting with ta
:
{
prefix => { author => "ta" }
}
or, using the boost
parameter:
{
prefix => {
author => {
value => "ta",
boost => 2.0,
}
}
}
See Elastic Search documentation for more information.
Wildcard Query
{
wildcard => { pauseid => "momo*o" }
}
or
{
wildcard => {
pauseid => {
value => "momo*o",
boost => 2.0,
}
}
}
See Elastic Search documentation for more information.
Regexp Query
This enables the use of regular expressions syntax
{
regexp => {
metadata.author => "Ta.*o"
}
}
or
{
regexp => {
metadata.author => {
value => "Ta.*o",
boost => 1.2,
flags => "INTERSECTION|COMPLEMENT|EMPTY",
}
}
}
Possible flags values are: ALL (default), ANYSTRING, COMPLEMENT, EMPTY, INTERSECTION, INTERVAL, or NONE
Check the regular expression syntax
See Elastic Search documentation for more information.
Fuzzy Query
{
fuzzy => { pauseid => "momo" }
}
With more advanced parameters:
{
fuzzy => {
user => {
value => "momo",
boost => 1.0,
fuzziness => 2,
prefix_length => 0,
max_expansions => 100
}
}
}
With number fields:
{
fuzzy => {
price => {
value => 12,
fuzziness => 2,
}
}
}
With date fields:
{
fuzzy => {
created => {
value => "2023-07-29T12:05:07",
fuzziness => "1d"
}
}
}
See Elastic Search documentation for more information.
Constant Score Query
As per the Elastic Search documentation, this is a "query that wraps another query and simply returns a constant score equal to the query boost for every document in the filter".
{
constant_score => {
filter => {
term => { pauseid => "momotaro"}
},
boost => 1.2,
}
}
See Elastic Search documentation for more information.
Bool Query
As per the Elastic Search documentation, this is a "query that matches documents matching boolean combinations of other queries."
The occurrence types are:
must
The clause (query) must appear in matching documents and will contribute to the score.
filter
The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.
should
The clause (query) should appear in the matching document. In a boolean query with no must or filter clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.
must_not
The clause (query) must not appear in the matching documents.
{
bool => {
must => {
term => { author => "momotaro" }
},
filter => {
term => { tag => "tech" }
},
must_not => {
range => {
age => { from => 10, to => 20 }
}
},
should => [
{
term => { tag => "wow" }
},
{
term => { tag => "elasticsearch" }
}
],
minimum_should_match => 1,
boost => 1.0,
}
}
See Elastic Search documentation for more information.
Dis Max Query
As per the Elastic Search documentation, this is a "query that generates the union of documents produced by its subqueries".
{
dis_max => {
tie_breaker => 0.7,
boost => 1.2,
queries => [
{
term => { "age" : 34 }
},
{
term => { "age" : 35 }
}
]
}
}
See Elastic Search documentation for more information.
Function Score Query
As per the Elastic Search documentation, the "function_score
allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.
To use function_score
, the user has to define a query and one or more functions, that compute a new score for each document returned by the query."
function_score => {
query => {},
boost => "boost for the whole query",
FUNCTION => {},
boost_mode => "(multiply|replace|...)"
}
Multiple functions can also be provided:
function_score => {
query => {},
boost => "boost for the whole query",
functions => [
{
filter => {},
FUNCTION => {},
weight => $number,
},
{
FUNCTION => {},
},
{
filter => {},
weight => $number,
}
],
max_boost => $number,
score_mode => "(multiply|max|...)",
boost_mode => "(multiply|replace|...)",
min_score => $number
}
score_mode
can have the following values:
multiply
Scores are multiplied (default)
sum
Scores are summed
avg
Scores are averaged
first
The first function that has a matching filter is applied
max
Maximum score is used
min
Minimum score is used
boost_mode
can have the following values:
multiply
Query score and function score is multiplied (default)
replace
Only function score is used, the query score is ignored
sum
Query score and function score are added
avg
Average
max
Max of query score and function score
min
Min of query score and function score
To exclude documents that do not meet a certain score threshold the min_score
parameter can be set to the desired score threshold.
See the Elastic Search documentation for the list of functions that can be used.
See Elastic Search documentation for more information.
Boosting Query
As per the Elastic Search documentation, the "boosting
query can be used to effectively demote results that match a given query. Unlike the "NOT" clause in bool
query, this still selects documents that contain undesirable terms, but reduces their overall score".
{
boosting => {
positive => {
term => {
field1 => "value1",
},
},
negative => {
term => {
field2 => "value2",
},
},
negative_boost => 0.2,
}
}
See Elastic Search documentation for more information.
Indices Query
{
indices => {
indices => [qw( index1 index2 )],
query => {
term => { tag => "wow" }
},
no_match_query => {
term => { tag => "kow" }
}
}
}
See Elastic Search documentation for more information.
Joining Queries
Elastic Search provides 2 types of joins that are "designed to scale horizontally": nested
and has_child / has_parent
See Elastic Search documentation for more information.
Nested Query
As per the Elastic Search documentation, the "nested
query allows to query nested objects / docs".
{
nested => {
path => "obj1",
score_mode => "avg",
query => {
bool => {
must => [
{
match => { "obj1.name" => "blue" }
},
{
range => { "obj1.count" => { gt => 5 } }
},
]
}
}
}
}
The score_mode
allows to set how inner children matching affects scoring of parent. It defaults to avg
, but can be sum
, min
, max
and none
.
See Elastic Search documentation for more information.
Geo Queries
Elastic Search supports two types of geo data: geo_point
and geo_shape
See Elastic Search documentation for more information.
Geo Bounding Box Query
A query allowing to filter hits based on a point location using a bounding box.
{
bool => {
must => {
match_all => {},
},
filter => {
geo_bounding_box => {
"author.location" => {
top_left => {
lat => 40.73,
lon => -74.1,
},
# or, using an array reference [long, lat]
# top_left => [qw( -74.1 40.73 )],
# or, using a string "lat, long"
# top_left => "40.73, -74.1"
# or, using GeoHash:
# top_left => "dr5r9ydj2y73",
bottom_right => {
lat => 40.01,
lon => -71.12,
},
# or, using an array reference [long, lat]
# bottom_right => [qw( -71.12 40.01 )],
# or, using a string "lat, long"
# bottom_right => "40.01, -71.12",
# or, using GeoHash:
# bottom_right => "drj7teegpus6",
},
# Set to true to accept invalid latitude or longitude (default to false)
ignore_malformed => \1,
}
}
}
}
or, using vertices
{
bool => {
must => {
match_all => {},
},
filter => {
geo_bounding_box => {
"author.location" => {
top => -74.1,
left => 40.73,
bottom => -71.12,
right => 40.01,
},
# Set to true to accept invalid latitude or longitude (default to false)
ignore_malformed => \1,
}
}
}
}
See Elastic Search documentation for more information.
Geo Distance Query
As per the Elastic Search documentation, this "filters documents that include only hits that exists within a specific distance from a geo point."
{
bool => {
must => {
match_all => {},
},
filter => {
geo_distance => {
distance => "200km",
"author.location" => {
lat => 40,
lon => -70,
}
# or, using an array reference [long, lat]
# "author.location" => [qw( -70 40 )],
# or, using a string "lat, long"
# "author.location" => "40, -70",
# or, using GeoHash
# "author.location" => "drm3btev3e86",
}
}
}
}
See Elastic Search documentation for more information.
Geo Distance Range Query
As per the Elastic Search documentation, this "filters documents that exists within a range from a specific point".
{
bool => {
must => {
match_all => {}
},
filter => {
geo_distance_range => {
from => "200km",
to => "400km",
2pin.location" : {
lat => 40,
lon => -70,
}
}
}
}
}
This supports the same geo point options as "Geo Distance Query"
It also "support the common parameters for range (lt
, lte
, gt
, gte
, from
, to
, include_upper
and include_lower
)."
See Elastic Search documentation for more information.
Geo Polygon Query
This allows "to include hits that only fall within a polygon of points".
{
bool => {
query => {
match_all => {}
},
filter => {
geo_polygon => {
"person.location" => {
points => [
{ lat => 40, lon => -70 },
{ lat => 30, lon => -80 },
{ lat => 20, lon => -90 }
# or, as an array [long, lat]
# [-70, 40],
# [-80, 30],
# [-90, 20],
# or, as a string "lat, long"
# "40, -70",
# "30, -80",
# "20, -90"
# or, as GeoHash
# "drn5x1g8cu2y",
# "30, -80",
# "20, -90"
]
},
# Set to true to ignore invalid geo points (defaults to false)
ignore_malformed => \1,
}
}
}
}
See Elastic Search documentation for more information.
GeoHash Cell Query
See Elastic Search documentation for more information.
More Like This Query
As per the Elastic Search documentation, the "More Like This Query (MLT Query) finds documents that are "like" a given set of documents".
"The simplest use case consists of asking for documents that are similar to a provided piece of text".
For example, querying for all module releases that have some text similar to "Application Programming Interface" in their "abstract" and in their "description" fields, limiting the number of selected terms to 12.
{
more_like_this => {
fields => [qw( abstract description )],
like => "Application Programming Interface",
min_term_freq => 1,
max_query_terms => 12,
# optional
# unlike => "Python",
# Defaults to 30%
# minimum_should_match => 2,
# boost_terms => 1,
# Defaults to false
# include => \1,
# Defaults to 1.0
# boost => 1.12
}
}
See Elastic Search documentation for more information.
Template Query
As per the Elastic Search documentation, this "accepts a query template and a map of key/value pairs to fill in template parameters".
{
query => {
template => {
inline => { match => { text => "{{query_string}}" }},
params => {
query_string => "all about search",
}
}
}
}
would be translated to:
{
query => {
match => {
text => "all about search",
}
}
}
See Elastic Search documentation for more information.
Script Query
As per the Elastic Search documentation, this is used "to define scripts as queries. They are typically used in a filter context". for example:
bool => {
must => {
# query details goes here
# ...
},
filter => {
script => {
script => "doc['num1'].value > 1"
}
}
}
See Elastic Search documentation for more information.
Span Term Query
As per the Elastic Search documentation, this matches "spans containing a term".
{
span_term => { pauseid => "momotaro" }
}
See Elastic Search documentation for more information.
Span Multi Terms Query
The span_multi
query allows you to wrap a multi term query (one of wildcard
, fuzzy
, prefix
, term
, range
or regexp
query) as a span
query, so it can be nested.
{
span_multi => {
match => {
prefix => { pauseid => { value => "momo" } }
}
}
}
See Elastic Search documentation for more information.
Span First Query
As per the Elastic Search documentation, this matches "spans near the beginning of a field".
{
span_first => {
match => {
span_term => { pauseid => "momotaro" }
},
end => 3,
}
}
See Elastic Search documentation for more information.
Span Near Query
As per the Elastic Search documentation, this matches "spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order".
{
span_near => {
clauses => [
{ span_term => { field => "value1" } },
{ span_term => { field => "value2" } },
{ span_term => { field => "value3" } },
],
collect_payloads => \0,
in_order => \0,
slop => 12,
},
}
The clauses
element is a list of one or more other span type queries and the slop
controls the maximum number of intervening unmatched positions permitted.
See Elastic Search documentation for more information.
Span Or Query
As per the Elastic Search documentation, this matches "the union of its span clauses".
{
span_or => {
clauses => [
{ span_term => { field => "value1" } },
{ span_term => { field => "value2" } },
{ span_term => { field => "value3" } },
],
},
}
The clauses
element is a list of one or more other span type queries
See Elastic Search documentation for more information.
Span Not Query
As per the Elastic Search documentation, this removes "matches which overlap with another span query".
{
span_not => {
exclude => {
span_near => {
clauses => [
{ span_term => { field1 => "la" } },
{ span_term => { field1 => "hoya" } },
],
in_order => \1,
slop => 0,
},
},
include => { span_term => { field1 => "hoya" } },
},
}
The include
and exclude
clauses can be any span type query.
See Elastic Search documentation for more information.
Span Containing Query
As per the Elastic Search documentation, this returns "matches which enclose another span query".
{
span_containing => {
big => {
span_near => {
clauses => [
{ span_term => { field1 => "bar" } },
{ span_term => { field1 => "baz" } },
],
in_order => \1,
slop => 5,
},
},
little => { span_term => { field1 => "foo" } },
},
}
The big
and little
clauses can be any span
type query. Matching spans from big
that contain matches from little
are returned.
See Elastic Search documentation for more information.
Span Within a Query
As per the Elastic Search documentation, this returns "matches which are enclosed inside another span query".
{
span_within => {
big => {
span_near => {
clauses => [
{ span_term => { field1 => "bar" } },
{ span_term => { field1 => "baz" } },
],
in_order => \1,
slop => 5,
},
},
little => { span_term => { field1 => "foo" } },
},
}
The big
and little
clauses can be any span
type query. Matching spans from little
that are enclosed within big
are returned.
See Elastic Search documentation for more information.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
Net::API::CPAN::Scroll, Net::API::CPAN::List
COPYRIGHT & LICENSE
Copyright(c) 2023 DEGUEST Pte. Ltd.
All rights reserved
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.