NAME
Parse::Syslog::Line - Simple syslog line parser
VERSION
version 6.1
SYNOPSIS
I wanted a very simple log parser for network based syslog input. Nothing existed that simply took a line and returned a hash ref all parsed out.
use Parse::Syslog::Line qw(parse_syslog_line);
$Parse::Syslog::Line::AutoDetectJSON = 1;
$Parse::Syslog::Line::AutoDetectKeyValues = 1;
my $href = parse_syslog_line( $msg );
#
# $href = {
# preamble => '13',
# priority => 'notice',
# priority_int => 5,
# facility => 'user',
# facility_int => 8,
# date => 'YYYY-MM-DD',
# time => 'HH::MM:SS',
# epoch => 1361095933,
# datetime_local => ISO 8601 datetime, in local timezone (potentially buggy)
# datetime_str => ISO 8601 datetime, in message timezone
# datetime_utc => ISO 8601 datetime, in UTC
# datetime_raw => 'Feb 17 11:12:13'
# host_raw => 'hostname', # Hostname as it appeared in the message
# host => 'hostname', # Hostname without domain
# domain => 'blah.com', # if provided
# program_raw => 'sshd(blah)[pid]',
# program_name => 'sshd',
# program_sub => 'pam_unix',
# program_pid => 20345,
# content => 'the rest of the message'
# message => 'program[pid]: the rest of the message',
# message_raw => 'The message as it was passed',
# ntp => 'ok', # Only set for Cisco messages
# version => 1,
# SDATA => { ... }, # RFC Structured data, decoded JSON, or K/V Pairs in the message
# };
...
EXPORT
Exported by default: parse_syslog_line( $one_line_of_syslog_message );
Optional Exports: :preamble preamble_priority preamble_facility
:constants
%LOG_FACILITY
%LOG_PRIORITY
:with_timezones
set_syslog_timezone
get_syslog_timezone
use_utc_syslog
VARIABLES
ExtractProgram
If this variable is set to 1 (the default), parse_syslog_line() will try it's best to extract a "program" field from the input. This is the most expensive set of regex in the module, so if you don't need that pre-parsed, you can speed the module up significantly by setting this variable.
Vendors who do proprietary non-sense with their syslog formats are to blame for this setting.
Usage:
$Parse::Syslog::Line::ExtractProgram = 0;
DateParsing
If this variable is set to 0 raw date will not be parsed further into components (datetime_str date time epoch). Default is 1 (parsing enabled).
Usage:
$Parse::Syslog::Line::DateParsing = 0;
TimeMomentFormatString
This defaults to "%FT%T%f%z"
. See "EXAMPLE FORMAT STRINGS" in Time::Moment for syntax and usage.
EpochCreate
If this variable is set to 1, the default, the number of seconds from UNIX epoch will be returned in the $m->{epoch} field. Setting this to false will only delete the epoch before returning the hash reference.
FmtDate
You can pass your own formatter/parser here. Given a raw datetime string it should output a list containing date, time, epoch, datetime_str, in your wanted format.
use Parse::Syslog::Line;
local $Parse::Syslog::Line::FmtDate = sub {
my ($raw_datestr) = @_;
my @elements = (
#date
#time
#epoch
#datetime_str
);
return @elements;
};
NOTE: No further date processing will be done, you're on your own here.
AutoDetectJSON
Default is false. If true, we'll autodetect the presence of JSON in the syslog message and use JSON::MaybeXS to decode it. The detection/decoding is simple. If a '{' is detected, everything until the end of the message is assumed to be JSON. The decoded JSON will be added to the SDATA
field.
$Parse::Syslog::Line::AutoDetectJSON = 1;
AutoDetectKeyValues
Default is false. If true, we'll autodetect the presence of Splunk style key/value pairds in the message stream. That format is k1=v1, k2=v2
. Resulting K/V pairs will be added to the SDATA
field.
$Parse::Syslog::Line::AutoDetectKeyValues = 1;
RFC5424StructuredData
Default is true. When enabled, this will extract the RFC standard structured data from the message content. That content will be stripped from the message content
field.
Some examples:
# Input
[foo x=1] some words [bar x=2]
# To (YAML for brevity)
---
SDATA:
bar:
x: 2
foo:
x: 1
content: some words
# Input
[x=1] some words
# To (YAML for brevity)
---
SDATA:
x: 1
content: some words
To disable:
$Parse::Syslog::Line::RFC5424StructuredData = 0;
RFC5424StructuredDataStrict
Require the format:
[namespace@id property="value"][namespace@id property="value"]
Defaults to 0, set to 1 to only parse the RFC5424 formatted structured data.
PruneRaw
This variable defaults to 0, set to 1 to delete all keys in the return hash ending in "_raw"
Usage:
$Parse::Syslog::Line::PruneRaw = 1;
PruneEmpty
This variable defaults to 0, set to 1 to delete all keys in the return hash which are undefined.
Usage:
$Parse::Syslog::Line::PruneEmpty = 1;
PruneFields
This should be an array of fields you'd like to be removed from the hash reference.
Usage:
@Parse::Syslog::Line::PruneFields = qw(facility_int priority_int);
FUNCTIONS
parse_syslog_line
Returns a hash reference of syslog message parsed data.
NOTE: Date/time parsing is hard. This module has been optimized to balance common sense and processing speed. Care is taken to ensure that any data input into the system isn't lost, but with the varieties of vendor and admin crafted date formats, we don't always get it right. Feel free to override date processing using by setting the $FmtDate
variable or completely disable it with $DateParsing
set to 0.
Dates and Version 6+
As of version 6.0
and later, the date parsing is handled by Time::Moment. Ideally, I would use Date for performance reasons, but it requires some heavy XS toolkits to build which don't work on my MacBookPro out of the box. This made the decision to use Time::Moment
kinda automatic. If you are seriously concerned with performance, enough to figure out how to package and run Date successfully, you can use the $FmtDate
parameter to inject your own date processing logic.
Time::Moment
's API and known limitations informed updates to the API and output of dates in this module. It is drastic enough a shift to warrant a major version bump.
"The Effect of Daylight Saving Time" in Time::Moment explains that to properly convert times during DST transitions, things get messy. This caused issues in testing and warrants words of caution here, ALWAYS use datetime_utc
or epoch
fields for datetime portability.
The changes to the API and fields returned are as follows:
- API Changes
-
DateTimeCreate
is deprecated-
DateTime is slow and memory heavy. I never should've added support for it in this module. This release removes it. If you need DateTime objects, you'll need to build it yourself.
HiResFmt
is deprecated, useTimeMomentFormatString
NormalizeToUTC
is deprecated, every log now returnsdatetime_utc
OutputTimeZone
is deprecated, useTimeMomentFormatString
- Field Changes
-
datetime_utc
-
Present in every document, use this for portability.
datetime_str
-
Now represents the parsed datetime as from the log without modifying the timezone.
datetime_local
-
Attempts to represent the datetime in the timezone local to the program. This is prone to errors around DST, I don't advise using this, but it's provided as footgun for future generations.
offset
renamed totz
Fields Returned
- preamble
-
Syslog preamble without the brackets, i.e.,
13
. - priority
-
String representation of the priority, i.e.,
"warn"
- priority_int
-
Integer representation of the priority, i.e.,
1
- facility
-
String representation of the facility, i.e.,
"daemon"
- priority_int
-
Integer representation of the facility, i.e.,
1
- datetime_raw
-
The datetime string from the log as it was discovered
- epoch
-
Numeric representation of the UNIX time as parsed by the
datetime_str
. This is the most portable format for computers and I recommend using it, and only it for passing onto to computer systems. - datetime_utc
-
UTC representation of the
datetime_raw
in ISO8601 format (viaTimeMomentFormatString
). If you must use a string format, this is the one you should pass to other computers. - datetime_str
-
ISO8601 representation of the
datetime_raw
(viaTimeMomentFormatString
), without manipulating timezones. - datetime_local
-
ISO8601 representation of the
datetime_raw
(viaTimeMomentFormatString
) attempting to manipulate into the timezone of the local computer or the timezone set byset_syslog_timezone()
.NOTE: This does not handle DST well as the logic for that requires DateTime::TimeZone when using Time::Moment. Adding
DateTime
back into this module will kill performance, so I accept the inaccuracy here as you should never use this.It is provided for those living in Arizona to mock the rest of us for our stupid DST sins.
- date
-
The date portion of
datetime_str
- time
-
The time portion of
datetime_str
- tz
-
The timezone offset of
datetime_str
- host_raw
-
The source host of the log as parsed, i.e.,
"host.example.com"
- host
-
Host portion of the
host_raw
, i.e.,"host"
- domain
-
Domain portion of the
host_raw
, i.e.,"example.com"
- origin
-
If relayed, contains the origin of the message, i.e., "host.example.com"
- origin_date
-
If relayed, contains the origin timestamp, this is unparsed.
- program_raw
-
The program, appname, or syslogtag in full, save the final colon, i.e.,
sshd(pam_unix)[35454]
. - program_name
-
Program name parsed from
program_raw
, i.e.,sshd
. - program_pid
-
The PID as parsed from the
program_raw
, i.e.,35454
. - program_sub
-
The program context as parsed from
program_raw
, i.e.,pam_unix
. - content
-
Everything after the syslog tag, except when using
AutoDetectJSON
orAutoDetectKeyValues
. When detecting structured data, successfully parsed chunks of the message are removed from the string.As an example, if the message is:
2015-09-30T06:26:06.779373-05:00 my-host my-script.pl: updating data {"lunchTime":1443612366.442}
By default,
content
will be:updating data {"lunchTime":1443612366.442}
However, if
AutoDetectJSON
is set, thencontent
will be:updating data
And the JSON will be decoded into the
SDATA
field. - SDATA
-
The structured data from the log message. This include RFC5424 Structured Data as well as anything extracted by
AutoDetectJSON
and/orAutoDetectKeyValues
. - message
-
Everything from the syslogtag onward, i.e.,
"program_raw content"
- message_raw
-
The entire message passed into the function.
set_syslog_timezone($timezone_name)
Sets a timezone $timezone_name
for parsed messages. This timezone will be used to calculate offset from UTC if a timezone designation is not present in the message being parsed. This timezone will also serve as the source timezone for the datetime_local
field.
get_syslog_timezone()
Returns the name of the timezone currently set by set_syslog_timezone.
use_utc_syslog()
A convenient function which sets the syslog timezone to UTC.
parse_syslog_lines
Returns a list of hashes of the lines interpretted.
When passed one or more line of text, attempts to parse that text as syslog data. This function varies from parse_syslog_line
in that it handles multi-line messages. The caveat to this, is after the last iteration of the loop, you to call the function by itself to get the last message.
use strict;
use warnings;
use DDP;
use Parse::Syslog::Line qw(parse_syslog_lines);
while(<>) {
foreach my $log ( parse_syslog_lines($_) ) {
p($log);
}
}
p($_) for parse_syslog_lines();
This function holds a parsing buffer which it flushes any time it encounters a line in the stream that starts with non-whitespace. Any lines beginning with whitespace will be assumed to be a continuation of the previous line.
It is not exported by default.
psl_enable_sdata
Call this to turn on all the Structured Data Parsing Options
preamble_priority
Takes the Integer portion of the syslog messsage and returns a hash reference as such:
$prioRef = {
'preamble' => 13
'as_text' => 'notice',
'as_int' => 5,
};
preamble_facility
Takes the Integer portion of the syslog messsage and returns a hash reference as such:
$facRef = {
'preamble' => 13
'as_text' => 'user',
'as_int' => 8,
};
ENVIRONMENT VARIABLES
There are environment variables that affect how we operate. They are not options as they are not intended to be used by our users. Use at your own risk.
PARSE_SYSLOG_LINE_DEBUG
Outputs debugging information about the parser, not really intended for end-users.
PARSE_SYSLOG_LINE_QUIET
Disables warnings in the parse_syslog_line() function
TEST_ACTIVE / TEST2_ACTIVE
Disables warnings in the parse_syslog_line() function
DEVELOPMENT
This module is developed with Dist::Zilla. To build from the repository, use Dist::Zilla:
dzil authordeps --missing |cpanm
dzil listdeps --missing |cpanm
dzil build
dzil test
AUTHOR
Brad Lhotsky <brad@divisionbyzero.net>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2017 by Brad Lhotsky.
This is free software, licensed under:
The (three-clause) BSD License
CONTRIBUTORS
Bartłomiej Fulanty <starlight@cpan.org>
Csillag Tamas <cstamas@digitus.itk.ppke.hu>
Keedi Kim <keedi.k@gmail.com>
Mateu X Hunter <mhunter@maxmind.com>
Neil Bowers <neil@bowers.com>
Shawn Wilson <swilson@korelogic.com>
Tomohiro Hosaka <bokutin@bokut.in>
SUPPORT
Websites
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
MetaCPAN
A modern, open-source CPAN search engine, useful to view POD in HTML format.
Bugs / Feature Requests
This module uses the GitHub Issue Tracker: https://github.com/reyjrar/Parse-Syslog-Line/issues
Source Code
This module's source code is available by visiting: https://github.com/reyjrar/Parse-Syslog-Line