NAME
logstatsd - generate summary statistics from log files
SYNOPSIS
logstatsd [OPTIONS]
logstatsd -f status:0 -f duration:5 -l /path/to/logfile --xml
logstatsd -f status:0 -f duration:5 -l /path/to/logfile --xml /path/to/report.xml
# for more examples and explanations, see the EXAMPLES section below.
DESCRIPTION
Monitoring an application frequently involves monitoring it's log file(s). Log files may contain hundreds or thousands of events per minute. Parsing the entire log file can be a very cpu intensive task making near-real-time reporting or monitoring difficult to impossible.
logstatsd was designed to solve these problems and more while being extremely simple to use and configure. logstatsd can monitor log files, parse entries as they enter the log, and store summary data. logstatsd can then be signaled to export current summary data for populating an RRD or feeding data to a monitoring application.
logstats parses log entries into fields and extracts fields that you find interesting, e.g. transaction name, status, duration, date/time, end user locations, back end server names, etc. Summary data can be collected for each interesting field. So for example, if a transaction field is specified, the number of hits for each unique transaction will be counted. If a duration field is available in the log, then information about average response times of each transaction will also be recorded.
Additionally, summary data may be collected for grouped fields. For example, if you collect summary statistics about transaction name grouped with the status, you will see information about the numbers of success and failures of each transaction. If you collect summary statistics about status grouped with time, you can then see statistics about the successful and unsuccessful transactions per minute.
Also, thresholds may be defined to categorize response times (see THRESHOLDS section below).
logstatsd is designed to run as a daemon on the server where the log file resides. When run in daemon mode, it will tail the log file and process new entries as they arrive in the log. Summary data may be extracted by sending a "kill -USR1" to the logstatsd process id.
Data can be exported to an xml report, or to a script that can be used to populate a RRD.
logstatsd is designed to parse formatted data in log files. Unlike other log processing tools which run a series of regexps on each log entry and count each match, logstatsd splits each entry into a series of fields using a single regexp. This makes it useful for files like an apache access log or CSV files, but less useful for files with less predicatble contents like an apache error log.
OPTIONS
The following options are supported by this command
- -l, --logfile=LOGFILE
-
Specify log file to be summarized.
- --field=[NAME][:COLUMN][|THRESHOLD1][|THRESHOLD2...]
-
Specify a field from the log that should be summarized. Multiple field options may be specified. The index for the first column should be 0.
For example, if your file is a csv, and the first column is "status", the field definition would be -field status:0.
If a duration field was specified, thresholds can be associated with the durations (see THRESHOLDS below).
Field names should not contain dashes.
- --group=[NAME1]:[NAME2...][|THRESHOLD1][|THRESHOLD2...]
-
Define two fields which should be grouped for summary statistics. Multiple groups options may be specified.
For example, you might want to keep statics about each transaction based on status. In this case, you can simply use the options "-groups transaction:status".
Note that order is important for display purposes. transaction:status would display each transaction, and then each status for the transaction. status:transaction will display each status, and then list each transaction with the associated status.
For display purposes, it will always look better when you use the field which has the least number of possible values first.
Log::Statistics will handle groups with any number of members, but at this point logstatsd will only handle groups with two or three fields.
- -c, --conf=CONFIGFILE
-
Specify location of config file. A config file is a convenient way to store default information about a type of logfile. For example, create a section called "mylog" that contains your field definitions and time regexp:
[mylog] time_regexp = (\d\d\d\d\/\d\d\/\d\d\s\d\d\:\d\d)\: field_list =<<EOF status:0 type:1 system:2 transaction:3 duration:5 time:7 EOF
Then, from the command line, simply specify the config file and the section "mylog", and you can reference fields by name without having to specify the column number:
logstatsd -l /path/to/logfile --xml - -f transaction --group transaction:status
- -s, --section=SECTION
-
Specify section to be read from config file.
- -a, --all
-
On start, parse the entire log file.
May be combined with -d.
- -d, --daemon
-
Enable daemon mode. In daemon mode, the log file will be opened in tail mode (using File::Tail). Each new line that arrives in the log file will be processed. Data may be obtained from the running daemon by sending a USR1 signal (kill -USR1 <pid>).
May be combined with -a.
- --xml [file]
-
Generate an xml file containing all currently captured summary data. If "-" is specified, the xml will be printed to stdout.
- --rrd [file]
-
Generate a shell script of "rrdtool update" commands to update a set of rrd files. RRD files are not generated directly at this time, since the script is much more efficient for transport to a centralized monitoring server and updating rrd files there. If "-" is specified, the rrd commands will be printed to stdout.
Once in daemon mode, RRD commands will be generated using the current time stamp and currently available summary data.
To specify which counters should be used to build rrd files, use the -rrdupdate option.
If the -rrd option is combined with -a, and if a "time" field and a time-regexp were both defined, then times will be parsed from the logs, and "rrd update" commands will be generated for each minute. Currently this behaviour is only available for the total summary data and not for any defined rrdupdate fields.
Note that currently all RRDs assume that you have defined 4 thresholds. If you define less thresholds, your RRDs will be a little larger than necessary. If you define more, your RRDs will only track the first 4.
- --rrdupdate [field1|field2|field3][|field4]
-
Specify the rrd databases that should be updated when running in daemon mode. Any number of rrdupdate options may be specified.
The fields in this option specify keys used to look up the option in the internal group data. To look up a *field* directly, use the definition "fields|fieldname|fieldvalue". For example, if you specified a field called "status", you can build an RRD from all entries with status "SUCCESS" by using this rrdupdate definition:
fields|status|SUCCESS
In order to track *group* fields (i.e. those specified with -group), use the definition "groups|name1-name2|value1|value2". For example, if you are grouping status by transaction, to build RRDs for all transactions with status FAIL and name mytrans.do, use this:
groups|status-transaction|FAIL|mytrans.do
- -t, --time-regexp <regexp>
-
Specify the regexp used to parse the time field, if specified. The regexp should include a single capture expression, which when run on the dat field, will return the date and time.
Ideally you should attempt to capture the year, month, day, hour, and minute. Do not capture seconds unless you really want summary data broken down per second.
- --line-regexp <regexp>
-
Specify regexp used to parse the entire log entry. The regexp should capture each field in the log, which can then be referenced using the usual column number. For a simple silly example,
--line-regexp "^(.*?),(.*?),(.*?)"
This would capture the first three comma-separated fields from the log entry, and make them available as column number 0, 1, and 2.
- --ssh [servername]
-
Experimental. Specify the remote server on which the log file lives.
When using this option, you should install Craig H. Rowland's program 'logtail' on the target server, and specify the location using the logtail config param. Using the ssh option without the logtail option may be unstable and is not recommended.
Note that the "-all" flag is not yet supported when using ssh.
- --logtail [/path/to/logtail]
-
Experimental. Can only be used with ssh. Specify the path to the logtail program written in C by Craig H. Rowland. From the logtail documentation:
This program will read in a standard text file and create an offset marker when it reads the end. The offset marker is read the next time logtail is run and the text file pointer is moved to the offset location. This allows logtail to read in the next lines of data following the marker. This is good for marking log files for automatic log file checkers to monitor system events.
Note that on the first processing of a new file using logtail, all log entries will be read in and processed. On subsequent restarts, logtail will only process lines not previously seen.
It is recommended that you also define the config param logtail_offset in your config file to specify the location of the offset file created by logtail. If this option is not defined, logtail will create a number of offset files.
- --version
-
Display version information.
THRESHOLDS
Thresholds allow monitoring the number of long response times. For example, a given transaction might be expected to be complete within 5 seconds. In addition to measuring the average response time of the transaction, you may also wish to measure how many transactions are not completed within 5 seconds. You may define any number of thresholds, so you could measure those that you consider to be fast (under 3 seconds), good (under 5 seconds), slow (over 10 seconds), and very slow (over 20 seoncds).
NOTE: If a duration field was not defined, then response times thresholds statistics can not be calculated.
DIAGNOSTICS
Coming Soon...
CONFIGURATION AND ENVIRONMENT
The config file is a simple .ini style config file. Here is an example config file:
[test]
time_regexp = (\d\d\d\d\/\d\d\/\d\d\s\d\d\:\d\d)\:
xml = /Users/wu/tmp/test.xml
logfile = /Users/wu/projects/logs/test.log.mini
field_list =<<EOF
status:0
type:1
system:2
transaction:3
duration:5
time:7
EOF
rrdupdate =<<EOF
fields|status|GOOD
fields|status|BAD
groups|status-transaction|BAD|mytrans1
groups|status-transaction|GOOD|mytrans2
EOF
rrd_step = 60
rrd_create =<<EOF
DS:duration:COUNTER:1200:0:5000
DS:hits:COUNTER:1200:0:5000
DS:over1:COUNTER:1200:0:5000
DS:over2:COUNTER:1200:0:5000
DS:over3:COUNTER:1200:0:5000
DS:over4:COUNTER:1200:0:5000
RRA:AVERAGE:0.5:1:1440
RRA:AVERAGE:0.5:5:1440
RRA:AVERAGE:0.5:30:1440
RRA:AVERAGE:0.5:120:144
EOF
Most params can be defined in the config file or on the command line. Params on the command line override those in the config file.
A full explanation of any configuration system(s) used by the module, including the names and locations of any configuration files, and the meaning of any environment variables or properties that can be set. These descriptions must also include details of any configuration language used. (See also "Configuration Files" in Chapter 19.)
EXAMPLES
# parse entire CSV file, column 1 contains status, and column 6
# contains duration. generate an xml report of number of responses
# and average response time data for each status.
logstatsd -a -f status:0 -f duration:5 -l /path/to/logfile --xml -
# parse entire CSV file, column 1 contains status, and column 6
# contains duration. generate an xml report of number of responses
# and average response time data for each status, including the
# number of responses that were under 5 seconds, those that were
# between 5-10 seconds, 10-20 seconds, and over 20 seconds.
logstatsd -a -f status:0:5|10|20 -f duration:5 -l /path/to/logfile --xml -
# parse entire CSV file. Column 1 contains status, column 3 contains
# transaction name, and column 6 contains duration. generate an xml
# report of responses for each status, for each transaction, and also
# break down response data for each transaction based on status.
logstatsd -a -f transaction:3 -f status:0 -f duration:5 --group status:transaction -l /path/to/logfile --xml -
# monitor CSV file for new incoming hits. generate an xml report on
# "kill -USR1 <logstats pid>"
logstatsd -d -f status:0 -f duration:5 -l /path/to/logfile --xml /path/to/report.xml
# monitor CSV file for new incoming hits. generate a script to
# update an RRD database on receipt of "kill -USR1 <logstats pid>"
logstatsd -d -f status:0 -f duration:5 -l /path/to/logfile --rrd /path/to/rrd_script.sh
# parse entire CSV file, and then begin monitoring for incoming
# hits. update xml report on completion of full parsing, and then
# on receipt of "kill -USR1 <logstats pid>"
logstatsd -f status:0 -f duration:5 -l /path/to/logfile --xml /path/to/report.xml
DEPENDENCIES
Benchmark - generating stats about long parsing times
File::Tail - for monitoring incoming data in a log
XML::Simple - for exporting data to xml
Config::IniFiles - for parsing the logstatsd.conf config file
Log::Log4perl - can be disabled by simply commenting out the use line.
Log::Statistics - logstatsd comes bundled with Log::Statistics, available from CPAN
Getopt::Long - command line options processing
Pod::Usage - for command line help
SEE ALSO
http://www.geekfarm.org/twiki/bin/view/Main/LogStatistics
BUGS AND LIMITATIONS
There are no known bugs in this script. Please report problems to VVu@geekfarm.org
Patches are welcome.
AUTHOR
VVu@geekfarm.org
LICENCE AND COPYRIGHT
Copyright (c) 2006, VVu@geekfarm.org All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of geekfarm.org nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.