NAME
es-search.pl - Provides a CLI for quick searches of data in ElasticSearch daily indexes
VERSION
version 5.4
SYNOPSIS
es-search.pl [search string]
Options:
--help print help
--manual print full manual
--show Comma separated list of fields to display, default is ALL, switches to tab output
--tail Continue the query until CTRL+C is sent
--top Perform an aggregation on the fields, by a comma separated list of up to 2 items
--by Perform an aggregation using the result of this, example: --by cardinality:@fields.src_ip
--with Perform a sub aggregation on the query
--match-all Enables the ElasticSearch match_all operator
--interval When running aggregations, wrap the aggreation in a date_histogram with this interval
--prefix Takes "field:string" and enables the Lucene prefix query for that field
--exists Field which must be present in the document
--missing Field which must not be present in the document
--size Result size, default is 20
--all Don't consider result size, just give me *everything*
--asc Sort by ascending timestamp
--desc Sort by descending timestamp (Default)
--sort List of fields for custom sorting
--format When --show isn't used, use this method for outputting the record, supported: json, yaml
--pretty Where possible, use JSON->pretty
--no-header Do not show the header with field names in the query results
--fields Display the field list for this index!
--bases Display the index base list for this cluster.
--timestamp Field to use as the date object, default: @timestamp
From App::ElasticSearch::Utilities:
--local Use localhost as the elasticsearch host
--host ElasticSearch host to connect to
--port HTTP port for your cluster
--proto Defaults to 'http', can also be 'https'
--http-username HTTP Basic Auth username
--http-password HTTP Basic Auth password (if not specified, and --http-user is, you will be prompted)
--password-exec Script to run to get the users password
--noop Any operations other than GET are disabled, can be negated with --no-noop
--timeout Timeout to ElasticSearch, default 30
--keep-proxy Do not remove any proxy settings from %ENV
--index Index to run commands against
--base For daily indexes, reference only those starting with "logstash"
(same as --pattern logstash-* or logstash-DATE)
--datesep Date separator, default '.' also (--date-separator)
--pattern Use a pattern to operate on the indexes
--days If using a pattern or base, how many days back to go, default: all
See also the "CONNECTION ARGUMENTS" and "INDEX SELECTION ARGUMENTS" sections from App::ElasticSearch::Utilities.
From CLI::Helpers:
--data-file Path to a file to write lines tagged with 'data => 1'
--color Boolean, enable/disable color, default use git settings
--verbose Incremental, increase verbosity (Alias is -v)
--debug Show developer output
--debug-class Show debug messages originating from a specific package, default: main
--quiet Show no output (for cron)
--syslog Generate messages to syslog as well
--syslog-facility Default "local0"
--syslog-tag The program name, default is the script name
--syslog-debug Enable debug messages to syslog if in use, default false
DESCRIPTION
This tool takes a search string parameter to search the cluster. It is in the format of the Lucene query string
Examples might include:
# Search for past 10 days vhost admin.example.com and client IP 1.2.3.4
es-search.pl --days=10 --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.4"
# Search for all apache logs past 5 days with status 500
es-search.pl program:"apache" AND crit:500
# Search for all apache logs past 5 days with status 500 show only file and out_bytes
es-search.pl program:"apache" AND crit:500 --show file,out_bytes
# Search for ip subnet client IP 1.2.3.0 to 1.2.3.255 or 1.2.0.0 to 1.2.255.255
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.*"
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.*"
# Show the top src_ip for 'www.example.com'
es-search.pl --base access dst:www.example.com --top src_ip
# Tail the access log for www.example.com 404's
es-search.pl --base access --tail --show src_ip,file,referer_domain dst:www.example.com AND crit:404
NAME
es-search.pl - Search a logging cluster for information
OPTIONS
- help
-
Print this message and exit
- manual
-
Print detailed help with examples
- show
-
Comma separated list of fields to display in the dump of the data
--show src_ip,crit,file,out_bytes
- sort
-
Use this option to sort your documents on fields other than
@timestamp
. Fields are given as a comma separated list:--sort field1,field2
To specify per-field sort direction use:
--sort field1:asc,field2:desc
Using this option together with
--asc
,--desc
or--tail
is not possible. - format
-
Output format to use when the full record is dumped. The default is 'yaml', but 'json' is also supported.
--format json
- tail
-
Repeats the query every second until CTRL+C is hit, displaying new results. Due to the implementation, this mode enforces that only the most recent indices are searched. Also, given the output is continuous, you must specify --show with this option.
- top
-
Perform an aggregation returning the top field. Limited to a single field at this time. This option is not available when using --tail.
--top src_ip
- by
-
Perform a sub aggregation on the top terms aggregation and order by the result of this aggregation. Aggregation syntax is as follows:
--by <type>:<field>
A full example might look like this:
$ es-search.pl --base access dst:www.example.com --top src_ip --by cardinality:@fields.acct
This will show the top source IP's ordered by the cardinality (count of the distinct values) of accounts logging in as each source IP, instead of the source IP with the most records.
Supported sub agggregations and formats:
cardinality:<field> min:<field> max:<field> avg:<field> sum:<field>
- with
-
Perform a subaggregation on the top terms and report that sub aggregation details in the output. The format is:
--with <aggregation>:<field>:<size>
The default size is 3. The default aggregation is 'terms'.
field is the only required element.
e.g.
$ es-search.pl --base logstash error --top program --size 2 --by cardinality:host --with host:5
This will show the top 2 programs with log messages containing the word error by the cardinality (count distinct host) of hosts showing the top 5 hosts
Without the --with, the results might look like this:
112314 sshd 21224 ntp
The --with option would expand that output to look like this:
112314 host bastion-804 12431 sshd 112314 host bastion-803 10009 sshd 112314 host bastion-805 9768 sshd 112314 host bastion-801 8789 sshd 112314 host bastion-802 4121 sshd 21224 host webapp-324 21223 ntp 21224 host mail-42 1 ntp
This may be specified multiple times, the result is more rows, not more columns, e.g.
$ es-search.pl --base logstash error --top program --size 2 --by cardinality:host --with host:5 --with dc:2
Produces:
112314 dc arlington 112314 sshd 112314 host bastion-804 12431 sshd 112314 host bastion-803 10009 sshd 112314 host bastion-805 9768 sshd 112314 host bastion-801 8789 sshd 112314 host bastion-802 4121 sshd 21224 dc amsterdam 21223 ntp 21224 dc la 1 ntp 21224 host webapp-324 21223 ntp 21224 host mail-42 1 ntp
You may sub aggregate using any bucket agggregation as long as the aggregation provides a key element. Additionally, doc_count, score, and bg_count will be reported in the output.
- interval
-
When performing aggregations, wrap those aggregations in a date_histogram of this interval. This helps flush out "what changed in the last hour."
- match-all
-
Apply the ElasticSearch "match_all" search operator to query on all documents in the index.
- prefix
-
Takes a "field:string" combination and you can use multiple --prefix options will be "AND"'d
Example:
--prefix useragent:'Go '
Will search for documents where the useragent field matches a prefix search on the string 'Go '
JSON Equivalent is:
{ "prefix": { "useragent": "Go " } }
- exists
-
Filter results to those containing a valid, not null field
--exists referer
Only show records with a referer field in the document.
- missing
-
Filter results to those not containing a valid, not null field
--missing referer
Only show records without a referer field in the document.
- bases
-
Display a list of bases that can be used with the --base option.
Use with --verbose to show age information on the indexes in each base.
- fields
-
Display a list of searchable fields
- index
-
Search only this index for data, may also be a comma separated list
- days
-
The number of days back to search, the default is 5
- base
-
Index base name, will be expanded using the days back parameter. The default is 'logstash' which will expand to 'logstash-YYYY.MM.DD'
- timestamp
-
The field in your documents that we'll treat as a "date" type in our queries.
- size
-
The number of results to show, default is 20.
- all
-
If specified, ignore the --size parameter and show me everything within the date range I specified. In the case of --top, this limits the result set to 1,000,000 results.
Extended Syntax
The search string is pre-analyzed before being sent to ElasticSearch. The following plugins work to manipulate the query string and provide richer, more complete syntax for CLI applications.
App::ElasticSearch::Utilities::Barewords
The following barewords are transformed:
or => OR
and => AND
not => NOT
App::ElasticSearch::Utilities::QueryString::IP
If a field is an IP address wild card, it is transformed:
src_ip:10.* => src_ip:[10.0.0.0 TO 10.255.255.255]
App::ElasticSearch::Utilities::Underscored
This plugin translates some special underscore surrounded tokens into the Elasticsearch Query DSL.
Implemented:
_prefix_
Example query string:
_prefix_:useragent:'Go '
Translates into:
{ prefix => { useragent => 'Go ' } }
App::ElasticSearch::Utilities::QueryString::FileExpansion
If the match ends in .dat, .txt, or .csv, then we attempt to read a file with that name and OR the condition:
$ cat test.dat
50 1.2.3.4
40 1.2.3.5
30 1.2.3.6
20 1.2.3.7
Or
$ cat test.csv
50,1.2.3.4
40,1.2.3.5
30,1.2.3.6
20,1.2.3.7
Or
$ cat test.txt
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7
We can source that file:
src_ip:test.dat => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
This make it simple to use the --data-file output options and build queries based off previous queries. For .txt and .dat file, the delimiter for columns in the file must be either a tab, comma, or a semicolon. For files ending in .csv, Text::CSV_XS is used to accurate parsing of the file format.
You can also specify the column of the data file to use, the default being the last column or (-1). Columns are zero-based indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten as:
src_ip:test.dat[1]
or: src_ip:test.dat[-1]
This option will iterate through the whole file and unique the elements of the list. They will then be transformed into an appropriate terms query.
Meta-Queries
Helpful in building queries is the --bases and --fields options which lists the index bases and fields:
es-search.pl --bases
es-search.pl --fields
es-search.pl --base access --fields
AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License