NAME
Elastic::Manual::QueryDSL::Queries - Overview of the queries available in Elasticsearch
VERSION
version 0.29_2
INTRODUCTION
Queries should be used instead of filters where "relevance scoring" is appropriate.
While a filter can be used for:
Give me docs that include the tag "perl" or "python"
... a query can do:
Give me docs that include the tag "perl" or "python", sorted by relevance
This is particularly useful for full text search, where there isn't a simple binary Yes/No answer. Instead, we're looking for the most relevant results which match a complex phrase like "perl unicode cookbook"
.
QUERY TYPES
There are 5 main query types:
- Analyzed queries
-
These are used for full text search on unstructured text. The search keywords are analyzed into terms before being searched on. For instance:
WHERE matches(content, 'perl unicode')
- Exact queries
-
These are used for exact matching. For instance:
WHERE tags IN ('perl','python')
. - Combining queries
-
These combine multiple queries together, eg
and
oror
. - Scoring queries
-
These can be used to alter how the relevance score is calculated.
- Joining queries
-
These work on parent-child relationships, or on "nested" docs.
BOOST
"Boost" is a way of increasing the relevance of part of a query. For instance, if I'm searching for the words "perl unicode" in either the title
or content
field of a post, I could do:
$view->queryb([
content => 'perl unicode',
title => 'perl unicode',
]);
But it is likely that documents with those words in the title
are more relevant than if those words appear only in the content
, so we can boost
the title
field:
$view->queryb([
content => 'perl unicode',
title => {
'=' => {
query => 'perl unicode',
boost => 2
}
},
]);
Or in the native Query DSL:
$view->queryb(
bool => {
should => [
{ text => { content => 'perl unicode' } },
{ text => {
title => {
query => 'perl unicode',
boost => 2
}
}
}
]
}
);
The boost
is multiplied with the _score
, so a boost
less than 1 will decrease relevance. Also see "explain" in Elastic::Model::Result for help when debugging relevance scoring.
ANALYZED QUERIES
The search keywords are analyzed before being searched on. The analyzer is chosen from the first item in this list which is set:
The
analyzer
specified in the queryThe search_analyzer specified on the field being searched
The analyzer specified on the field being searched
The default analyzer for the
type
being searched on
Simple text queries
- SearchBuilder
-
# where title matches "perl unicode" $view->queryb( title => 'perl unicode' ); $view->queryb( title => { '=' => 'perl unicode' }); # where the _all field matches "perl unicode" $view->queryb( 'perl unicode' ); $view->queryb( _all => 'perl unicode');
See "= | -text | != | <> | -not_text" in ElasticSearch::SearchBuilder.
- QueryDSL
-
# where title matches "perl unicode" $view->query( text => { title => 'perl unicode' } ); # where the _all field matches "perl unicode" $view->query( _all => { match => 'perl unicode' });
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
Phrase queries
Phrase queries match all words in the phrase, in the same order.
- SearchBuilder
-
# where title matches the phrase "perl unicode" $view->queryb( title => { '==' => 'perl unicode }); # where 'unicode' precedes 'perl' within 5 words of each other $view->queryb( title => { '==' => { query => 'perl unicode', slop => 5 } } ); # where title contains a phrase starting with "perl unic" $view->queryb( title => { '^' => 'perl unic' });
See "== | -phrase | -not_phrase" in ElasticSearch::SearchBuilder and "^ | -phrase_prefix | -not_phrase_prefix" in ElasticSearch::SearchBuilder.
- QueryDSL
-
# where title matches the phrase "perl unicode" $view->query( match_phrase => { title => 'perl unicode' } ); # where 'unicode' precedes 'perl' within 5 words of each other $view->query( match_phrase => { title => { query => 'perl unicode', slop => 5 } } ); # where title contains a phrase starting with "perl unic" $view->query( match_phrase_prefix => { title => 'perl unic' } );
Lucene query parser syntax
The query_string
and field
queries use the Lucene query parser syntax allowing complex queries like (amongst other features):
- Logic
-
'mac AND big NOT apple'
or'+mac +big -apple'
- Phrases
-
'these words and "exactly this phrase"'
- Wildcards
-
'test?ng wild*rd'
- Fields
-
'title:(big mac) content:"this exact phrase"'
- Boosting
-
'title:(perl unicode)^2 content:(perl unicode)'
- Proximity
-
(quick brown dog)~10
(within 10 words of each other)
The query_string
query can also be used for searching across multiple fields.
There are two downsides to this query:
The syntax must be correct, otherwise your query will fail.
Users can search any field using the
"field:"
syntax.
You can use "filter_keywords()" in ElasticSearch::Util for a simple filter, or ElasticSearch::QueryParser for a more flexible solution.
- SearchBuilder
-
# where the title field matches '+big +mac -apple' $view->queryb( title => { -qs => '+big +mac -apple' }); # where the _all field matches '+big +mac -apple' $view->queryb( _all => { -qs => '+big +mac -apple' }); # where the title or content fields match '+big +mac -apple' $view->queryb( -qs =>{ query => '+big +mac -apple', fields => ['title^2','content'] # boost the title field } );
See "-qs | -query_string | -not_qs | -not_query_string" in ElasticSearch::SearchBuilder.
- QueryDSL
-
# where the title field matches '+big +mac -apple' $view->query( field => { title => '+big +mac -apple' } ); # where the _all field matches '+big +mac -apple' $view->query( query_string => { query => '+big +mac -apple' }); $view->query( field => { _all => '+big +mac -apple', } ); # where the title or content fields match '+big +mac -apple' $view->query( query_string =>{ query => '+big +mac -apple', fields => ['title^2','content'] # boost the title field } );
More-like-this and Fuzzy-like-this
The more-like-this query tries to find documents similar to the search keywords, across multiple fields. It is useful for clustering related documents.
See "-mlt | -not_mlt" in ElasticSearch::SearchBuilder, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The fuzzy-like-this query is similar to more-like-this, but additionally "fuzzifies" all the search terms (finds all terms within a certain Levenshtein edit distance).
See "-flt | -not_flt" in ElasticSearch::SearchBuilder, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-flt-query.html.
EXACT QUERIES
These queries do not have an analysis phase. They try to match the actual terms stored in Elasticsearch. But unlike filters, the result of these queries is included in the relevance scoring.
Match all
Matches all docs.
- SearchBuilder
-
# All docs $view->queryb(); $view->queryb( -all => 1 )
- QueryDSL
-
# All docs $view->query(); $view->query( match_all => {} )
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html
Equality
- SearchBuilder:
-
# WHERE status = 'active' $view->queryb( status => 'active' ); # WHERE count = 5 $view->queryb( count => 5 ); # WHERE tags IN ('perl','python') $view->queryb( tags => [ 'perl', 'python' ]);
- QueryDSL:
-
# WHERE status = 'active' $view->query( term => { status => 'active' } ); # WHERE count = 5 $view->query( term => { count => 5 ); # WHERE tags IN ('perl','python') $view->query( terms => { tag => ['perl', 'python' ]})
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html.
Range
- SearchBuilder:
-
# WHERE date BETWEEN '2012-01-01' AND '2013-01-01' $view->queryb( date => { gte => '2012-01-01', lt => '2013-01-01' } );
- QueryDSL:
-
# WHERE date BETWEEN '2012-01-01' AND '2013-01-01' $view->query( range => { date => { gte => '2012-01-01', lt => '2013-01-01' } } );
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
Prefix, wildcard and fuzzy
A "fuzzy" query matches terms within a certain Levenshtein edit instance of the search terms.
Warning: These queries do not peform well. First they have to load all terms into memory to find those that match the prefix/wildcard/fuzzy conditions. Then they query all matching terms.
If you find yourself wanting to use any of these, then you should rather analyze your fields in a way that you can use a simple query on them instead, for instance, using the edge_ngram token filter or one of the phonetic token filters.
- SearchBuilder
-
# WHERE code LIKE 'abc%' $view->queryb( code => { '^' => 'abc' }); # WHERE code LIKE 'ab?c%' $view->queryb( code => { '*' => 'ab?c*' }) # where code contains terms similar to "purl unikode" $view->queryb( code => { fuzzy => 'purl unikode' })
See "PREFIX (FILTERS)" in ElasticSearch::SearchBuilder and "WILDCARD AND FUZZY QUERIES" in ElasticSearch::SearchBuilder.
- QueryDSL
-
# WHERE code LIKE 'abc%' $view->query( prefix => { code => 'abc' }); # WHERE code LIKE 'ab?c%' $view->query( wildcard => { code => 'ab?c*' }) # where code contains terms similar to "purl unikode" $view->query( fuzzy => { code => 'purl unikode' })
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html, http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html.
COMBINING QUERIES
These queries allow you to combine multiple queries together.
Filtered query
By default, queries are run on all documents. You can use a filtered query to reduce which documents are queried. This is the same query that is used to combine the query and filter attributes of Elastic::Model::View.
For instance, if you only want to query documents where status = 'active'
, then you can filter your documents with that restriction. A filter does not affect the relevance score.
- SearchBuilder
-
# document where status = 'active', and title matches 'perl unicode' $view->queryb( title => 'perl unicode', -filter => { status => 'active' } );
See "QUERY / FILTER CONTEXT" in ElasticSearch::SearchBuilder
- QueryDSL
-
# document where status = 'active', and title matches 'perl unicode' $view->queryb( filtered => { query => { text => { title => 'perl unicode'} }, filter => { term => { status => 'active' } } } );
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html.
Bool queries
bool
queries are the equivalent of and
, or
and not
except instead they use must
, should
and must_not
. The difference is that you can specify the minimum number of should
clauses that have to match (default 1).
In the SearchBuilder syntax, these use the same syntax as and
, or
and not
but you can also use the -bool
operator directly if you want to use minimum_number_should_match
.
Note: the scores of all matching clauses are combined together.
- SearchBuilder:
-
See "AND|OR LOGIC" in ElasticSearch::SearchBuilder and "-bool" in ElasticSearch::SearchBuilder
- And
-
# WHERE title matches 'perl unicode' AND status = 'active' $view->queryb( title => 'perl unicode', status => 'active' );
- Or
-
# WHERE title matches 'perl unicode' OR status = 'active' $view->queryb([ status => 'active', status => 'active' ]);
- Not
-
# WHERE status <> 'active' $view->queryb( status => { '!=' => 'active' }); # WHERE tags NOT IN ('perl','python') $view->queryb( tags => { '!=' => ['perl', 'python'] }); # WHERE NOT ( x = 1 AND y = 2 ) $view->queryb( -not => { x => 1, y => 2 }); # WHERE NOT ( x = 1 OR y = 2 ) $view->queryb( -not => [ x => 1, y => 2 ]);
- minimum_number_should_match
-
# where title matches 'object oriented' # and status <> 'inactive' # and tags contain 2 or more of 'perl','python','ruby' $view->queryb( -bool => { must => [{ title => 'object oriented' }], must_not => [{ status => 'inactive' }], should => [ { tag => 'perl' }, { tag => 'python' }, { tag => 'ruby' }, ], minimum_number_should_match => 2, } )
- QueryDSL:
-
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
- And
-
# WHERE title matches 'perl unicode' AND status = 'active' $view->query( bool => { must => [ { text => { title => 'perl unicode' }}, { term => { status => 'active' }} ] } );
- Or
-
# WHERE title matches 'perl unicode' OR status = 'active' $view->query( bool => { should => [ { text => { title => 'perl unicode' }}, { term => { status => 'active' }} ] } );
- Not
-
# WHERE status <> 'active' $view->query( bool => { must_not => [ { term => { status => 'active' }} ] } ); # WHERE tags NOT IN ('perl','python') $view->query( bool => { must_not => [ { terms => { tag => [ 'perl','python' ] }} ] } ); # WHERE NOT ( x = 1 AND y = 2 ) $view->query( bool => { must_not => [ { term => { x => 1 }}, { term => { y => 2 }} ] } ); # WHERE NOT ( x = 1 OR y = 2 ) $view->query( bool => { must_not => [ { bool => { should => [ { term => { x => 1 }}, { term => { y => 2 }} ] } } ] } );
- minimum_number_should_match
-
# where title matches 'object oriented' # and status <> 'inactive' # and tags contain 2 or more of 'perl','python','ruby' $view->query( bool => { must => [{ text => { title => 'object oriented' }}], must_not => [{ term => { status => 'inactive' }}], should => [ { term => { tag => 'perl' }}, { term => { tag => 'python' }}, { term => { tag => 'ruby' }}, ], minimum_number_should_match => 2, } )
Dis_max / Disjunction max query
While the "Bool queries" combine the scores of each matching clause, the dis_max
query uses the highest score of any matching clause. For instance, if we want to search for "perl unicode" in the title
and content
fields, we could do:
$view->queryb(
title => 'perl unicode',
content => 'perl unicode'
);
But we could have a doc which matches 'perl'
in both fields, and 'unicode'
in neither. As a boolean query, these two matches for 'perl'
would be added together. As a dis_max query, the higher score of the title
or the content
clause match would be used.
The tie_breaker
can be used to give a slight advantage to docs where both clauses match with the same score.
- SearchBuilder
-
# without tie_breaker: $view->queryb( -dis_max => [ { title => 'perl unicode' }, { content => 'perl unicode' } ] ); # with tie_breaker: $view->queryb( -dis_max => { tie_breaker => 0.7, queries => [ { title => 'perl unicode' }, { content => 'perl unicode' } ] } );
- QueryDSL
-
$view->query( dis_max => { tie_breaker => 0.7, queries => [ { title => 'perl unicode' }, { content => 'perl unicode' } ] } );
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html.
Indices
The indices
query can be used to execute different queries on different indices.
- SearchBuilder
-
# On index_one or index_two, only allow status = 'active' # On any other index, allow status IN ('active','pending') $view->queryb( -indices => { indices => [ 'index_one','index_two' ], query => { status => 'active' }, no_match_query => { status => [ 'active','pending' ]} } );
- QueryDSL
-
# On index_one or index_two, only allow status = 'active' # On any other index, allow status IN ('active','pending') $view->queryb( indices => { indices => [ 'index_one','index_two' ], query => { term => { status => 'active' }}, no_match_query => { terms => { status => [ 'active','pending' ] }} } );
See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-indices-query.html.
SCORING QUERIES
These queries allow you to tweak the relevance _score
, making certain docs more or less relevant.
Scoring with filters
The custom_filters_score
query allows you to boost documents that match a filter, either with a boost parameter, or with a custom script.
This is a very powerful and efficient way to boost results which depend on matching unanalyzed fields, eg a tag or a date. Because the filters can be cached, it performs very well.
- SearchBuilder
-
# include recency in the relevance score $view->queryb( -custom_filters_score => { query => { title => 'perl unicode' }, score_mode => 'first', filters => [ { filter => { date => { gte => '2012-01-01' }}, boost => 5 }, { filter => { date => { gte => '2011-01-01' }}, boost => 3 }, ] } );
See "-custom_filters_score" in ElasticSearch::SearchBuilder.
- QueryDSL
-
# include recency in the relevance score $view->query( custom_filters_score => { query => { text => { title => 'perl unicode' }}, score_mode => 'first', filters => [ { filter => { range => { date => { gte => '2012-01-01' }}}, boost => 5 }, { filter => { range => { date => { gte => '2011-01-01' }}}, boost => 3 }, ] } );
Other scoring queries
Boosting
Documents which match a query (eg "apple pear"
)can be "demoted" (made less relevant) if they also match a second query (eg "computer"
).
See "-boosting" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
Custom score
A custom_score
query uses a script to calculate the _score
for each matching doc.
See "-custom_score" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-custom-score-query.html
Custom boost factor
The custom_boost
query allows you to multiply the scores of another query by the specified boost
factor. This is a bit different from a standard boost
parameter, which is normalized.
See "-custom_boost" in ElasticSearch::SearchBuilder or http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-custom-boost-factor-query.html
Constant score
The constant_score
query does no relevance calculation - all docs are returned with the same score.
JOINING QUERIES
Parent-child queries
Parent-child relationships are not yet supported natively in Elastic::Model. They will be soon.
In the meantime, see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-query.html
Nested queries
See Elastic::Manual::QueryDSL::Nested.
SEE ALSO
AUTHOR
Clinton Gormley <drtech@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Clinton Gormley.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.