NAME
Net::Traces::TSH - Analyze IP traffic traces in TSH format
SYNOPSIS
use Net::Traces::TSH qw(:traffic_analysis);
# Enable progress information display
#
verbose;
# process the trace in file some_trace.tsh
#
process_trace 'some_trace.tsh';
# Then, write a summary of the trace contents to some_trace.csv, in
# Comma-Separated Values (CSV) format
#
write_trace_summary 'some_trace.csv';
ABSTRACT
Net::Traces::TSH provides methods to analyze IP packet traces in Time Sequenced Headers (TSH) format. Trace summary statistics are stored in comma separated values (CSV), a platform independent text format. Use Net::Traces::TSH to gather general information about a TSH packet trace, measure Transport protocol, DiffServ and ECN usage, and generate packet and segment size distributions. In addition, you can extract all TCP traffic present in a TSH trace in a tcpdump-like text format.
INSTALLATION
To install Net::Traces::TSH
type the following:
perl Makefile.PL
make
make test
make install
Moreover,
perldoc perlmodinstall
provides more information and options about installing Perl modules.
DESCRIPTION
With Net::Traces::TSH
you can analyze IP packet traces in Time Sequenced Headers (TSH), a binary network trace format. Each 44-byte TSH record corresponds to an IP packet passing by a monitoring point. Although there are no explicit section delimiters, each record is composed of three rather distinct sections.
- Time and Interface
-
The first section uses 8 bytes to store the time (with microsecond granularity) and the interface number of the corresponding packet, as recorded by the (passive) monitor.
- IP
-
The next 20 bytes contain the standard IP packet header. IP options are not recorded.
- TCP
-
The third and last section contains the first 16 bytes of the standard TCP segment header. The TCP checksum, urgent pointer, and TCP options (if any) are not included in a TSH record.
If a record does not correspond to a TCP segment, it is not clear how to interpret the last section. As such, Net::Traces::TSH
makes no assumptions, and does not analyze in detail packets from protocols other than TCP. That is, Net::Traces::TSH
reports on protocols other than TCP based on the second section (IP header) only.
The following diagram illustrates a TSH record.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Section
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 | Timestamp (seconds) | Time
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 | Interface No.| Timestamp (microseconds) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2 |Version| IHL |Type of Service| Total Length | IP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3 | Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4 | Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5 | Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6 | Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
7 | Source Port | Destination Port | TCP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
8 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
9 | Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E|U|A|P|R|S|F| |
10 | Offset|RSRV-ed|W|C|R|C|S|S|Y|I| Window |
| | |R|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This diagram is a modified version of the original TSH diagram (found on the NLANR PMA web site), which reflects the changes due to the addition of Explicit Congestion Notification (ECN) in the TCP header flags. Keep in mind that recent RFCs have modified the meaning of the IP header Type of Service field to accommodate Differentiated Services and Explicit Congestion Notification.
For example, you can use Net::Traces::TSH
to gather information from a TSH packet trace, perform statistical analysis on Transport protocol, DiffServ and ECN usage, and obtain packet and segment size distributions. The trace summary statistics are stored in comma separated values (CSV), a platform independent text format.
Data Structures
The data collected from a trace is stored is a hash called %Trace_Summary, the main data structure in Net::Traces::TSH
. %Trace_Summary is initialized and populated by process_trace. The recommended way to get the trace summary information is by calling write_trace_summary, which stores the contents of %Trace_Summary in a CSV-formated text file, as shown in SYNOPSIS.
%Trace_Summary is not exported by default and it is not intended to be accessed directly by user code. However, if you know what you are doing, you can get a reference to %Trace_Summary by calling get_trace_summary_href. If you choose to do so, the following subsections explain how you can access some of the information stored in %Trace_Summary. See also Taking advantage of %Trace_Summary.
General Trace Information
- $Trace_Summary{filename}
-
The trace FILENAME.
- $Trace_Summary{log}
-
The trace summary FILENAME.
- $Trace_Summary{starts}
-
The first trace timestamp, in seconds.
- $Trace_Summary{ends}
-
The last trace timestamp, in seconds.
- $Trace_Summary{records}
-
Number of records in the trace.
- $Trace_Summary{unidirectional}
-
True, if each interface carries unidirectional traffic.
False, if there is bidirectional traffic in at least one interface.
undef
if traffic directionality was not examined. - $Trace_Summary{Link Capacity}
-
The capacity of the monitored link in bits per second (b/s).
Internet Protocol
- $Trace_Summary{IP}{'Total Packets'}
- $Trace_Summary{IP}{'Total Bytes'}
-
Number of IP packets and bytes, respectively, in the trace. The number of IP packets should equal the number of records in the trace.
Fragmentation
- $Trace_Summary{IP}{'DF Packets'}
- $Trace_Summary{IP}{'DF Bytes'}
-
Number of IP packets and bytes, respectively, requesting no fragmentation ('Do not Fragment').
- $Trace_Summary{IP}{'MF Packets'}
- $Trace_Summary{IP}{'MF Bytes'}
-
Number of IP packets and bytes, respectively, indicating that 'More Fragments' follow.
Differentiated Services
- $Trace_Summary{IP}{'Normal Packets'}
- $Trace_Summary{IP}{ 'Normal Bytes'}
-
Number of IP packets and bytes, respectively, requesting no particular treatment (best effort traffic). None of the DiffServ and ECN bits are set.
- $Trace_Summary{IP}{'Class Selector Packets'}
- $Trace_Summary{IP}{'Class Selector Bytes'}
-
Number of IP packets and bytes, respectively, with Class Selector bits set.
- $Trace_Summary{IP}{'AF PHB Packets'}
- $Trace_Summary{IP}{'AF PHB Bytes'}
-
Number of IP packets and bytes, respectively, requesting Assured Forwarding Per-Hop Behavior (PHB).
- $Trace_Summary{IP}{'EF PHB Packets'}
- $Trace_Summary{IP}{'EF PHB Bytes'}
-
Number of IP packets and bytes, respectively, requesting Expedited Forwarding Per-Hop Behavior (PHB)
Explicit Congestion Notification
- $Trace_Summary{IP}{'ECT Packets'}
- $Trace_Summary{IP}{'ECT Bytes'}
-
Number of IP packets and bytes, respectively, with either of the ECT bits set. These packets carry ECN-capable traffic.
- $Trace_Summary{IP}{'CE Packets'}
- $Trace_Summary{IP}{'CE Bytes'}
-
Number of IP packets and bytes, respectively, with the CE bit set. There packets carry ECN-capable traffic that has been marked at an ECN-aware router.
Transport Protocols
Besides the summary information about the trace itself and statistics about IP, %Trace_Summary maintains information about the transport protocols present in the trace. Based on the IP header, %Trace_Summary maintains the same statistics mentioned in the previous section for TCP, UDP and other transport protocols with an IANA assigned number. For example,
- $Trace_Summary{Transport}{TCP}{'Total Packets'}
- $Trace_Summary{Transport}{TCP}{'Total Bytes'}
-
Number of TCP segments and the corresponding bytes (including the IP and TCP headers) in the trace.
- $Trace_Summary{Transport}{UDP}{'Total Packets'}
- $Trace_Summary{Transport}{UDP}{'Total Bytes'}
-
Ditto for UDP.
- $Trace_Summary{Transport}{ICMP}{'DF Packets'}
- $Trace_Summary{Transport}{ICMP}{'DF Bytes'}
-
Number of ICMP packets and bytes, respectively, with the DF bit set.
Taking advantage of %Trace_Summary
The following example creates the trace summary file only if the TCP traffic in terms of bytes accounts for more than 90% of the total IP traffic in the trace.
# Explicitly import process_trace(), write_trace_summary(), and
# get_trace_summary_href():
use Net::Traces::TSH qw( process_trace write_trace_summary
get_trace_summary_href
);
# Process a trace file...
#
process_trace "some.tsh";
# Get a reference to %Trace_Summary
#
my $ts_href = get_trace_summary_href;
# ...and create a summary only if the condition is met.
#
write_trace_summary
if ( ( $ts_href->{Transport}{TCP}{'Total Bytes'}
/ $ts_href->{IP}{'Total Bytes'}
) > 0.9);
FUNCTIONS
Net::Traces::TSH
does not export any functions by default. The following functions, listed in alphabetical order, are exportable.
date_of
date_of FILENAME
Converts the epoch timestamp, typically part of a TSH trace FILENAME downloaded from http://pma.nlanr.net/Traces to a human readable date. If FILENAME contains a valid timestamp, date_of() returns the corresponding GMT date as a string. Otherwise, it returns an empty string, i.e. false.
For example,
date_of 'ODU-1073132115.tsh'
returns Sat Jan 3 12:15:15 2004 GMT
.
get_IP_address
get_IP_address INTEGER
Converts a 32-bit integer to an IP address. For example,
get_IP_address(167772172)
returns 10.0.0.12
.
get_trace_summary_href
get_trace_summary_href
Returns a hash reference to %Trace_Summary.
process_trace
process_trace FILENAME
process_trace FILENAME, NUMBER
process_trace FILENAME, NUMBER, TEXT_FILENAME
If called in a void context process_trace() examines the binary TSH trace stored in FILENAME, and populates %Trace_Summary.
NUMBER specifies the capacity of the monitored link in bits per second (b/s). If not specified, it defaults to 155,520,000.
If called in a list context process_trace() gathers the same statistics and in addition it extracts all TCP flows and TCP data-carrying segments from the trace, returning two hash references. For example
my ($senders_href, $packets_href) = process_trace 'trace.tsh';
Here $senders_href is a reference to a hash which contains an entry for each TCP sender in the trace file. Each hash entry is a list of timestamps extracted from the trace record and stored after being "normalized" (start of trace = 0.0 seconds, always). In theory, all records should have different timestamps. In practice, although it is not very likely that two data segments have the same timestamp, I encountered a few traces that did have duplicate timestamps. process_trace() checks for such cases and implements a hash collision avoidance algorithm. If the collision threshold of trace records with the same timestamp is exceeded, process_trace() aborts as this is a hint that the trace is corrupted. The collision threshold is currently set to 4.
A TCP sender is identified by the ordered 4-tuple
(src, src port, dst, dst port)
where src and dst are the 32-bit integers corresponding to the IP addresses of the sending and receiving hosts, respectively. Similarly, src port and dst port are the sending and receiving processes port numbers. Senders are categorized on a per interface basis. For example, the following accesses the list of segments sent from 10.0.0.12:80 to 10.0.0.14:1080 (in interface 1):
$senders_href->{1}{167772172,80,167772174,1080}
The second returned value, $packets_href, is another hash reference, which can be used to access any individual data-carrying TCP segment in the trace. Again, packets are categorized on a per interface basis. Three values are stored per packet: the total number of bytes in the packet (including IP and TCP headers, and application payload), the segment sequence number, and whether the segment was retransmitted or not.
For example, assuming the the first record corresponds to a TCP segment, here is how you can print its packet size and the sequence number carried in the TCP header:
my $interface = 1;
my $timestamp = 0.0;
print $packets_href->{$interface}{$timestamp}{bytes};
print $packets_href->{$interface}{$timestamp}{seq_num};
You can also check whether a packet was retransmitted or not:
if ( packets_href->{$interface}{$timestamp}{retransmitted} ) {
print "Packet was retransmitted by the TCP sender.";
}
else {
print "Packet must have been acknowledged by the TCP receiver.";
}
Please note that process_trace() only initializes the "retransmitted" value to false (0). It is write_sojourn_times() that detects retransmitted segments and updates the "retransmitted" entry to true, if it is determined that the segment was retransmitted.
CAVEAT: write_sojourn_times() has not been finalized yet, and as such it is not included in this version. Contact me if you want to to get the most recent version.
If TEXT_FILENAME is specified, process_trace() generates a text file based on the trace records in a format similar to the modified output of tcpdump, as presented in TCP/IP Illustrated Volume 1 by W. R. Stevens. The format is explained in more detail in TCP/IP Illustrated Volume 1, pp. 230-231.
You can use such an output as input to other tools, present real traffic scenarios in a classroom, or simply "eyeball" the trace. For example, here are the first ten lines of the contents of such a file:
0.000000000 10.0.0.1.6699 > 10.0.0.2.55309: . ack 225051666 win 65463
0.000014000 10.0.0.3.80 > 10.0.0.4.14401: S 457330477:457330477(0) ack 810547499 win 34932
0.000014000 10.0.0.1.6699 > 10.0.0.2.55309: . 3069529864:3069531324(1460) ack 225051666 win 65463
0.000024000 10.0.0.5.12119 > 10.0.0.6.80: F 2073668891:2073668891(0) ack 183269290 win 64240
0.000034000 10.0.0.7.4725 > 10.0.0.8.445: S 3152140131:3152140131(0) win 16384
0.000067000 10.0.0.1.6699 > 10.0.0.2.55309: P 3069531324:3069531944(620) ack 225051666 win 65463
0.000072000 10.0.0.11.3381 > 10.0.0.12.445: S 1378088462:1378088462(0) win 16384
0.000083000 10.0.0.13.1653 > 10.0.0.1.6699: P 3272208349:3272208357(8) ack 501563814 win 32767
0.000093000 10.0.0.14.1320 > 10.0.0.15.445: S 3127123478:3127123478(0) win 64170
0.000095000 10.0.0.4.14401 > 10.0.0.3.80: R 810547499:810547499(0) ack 457330478 win 34932
Note that the text output is similar to what tcpdump with options -n
and -S
would have produced. The only missing field is the TCP options negotiated during connection setup. Unfortunately, TSH records include only the first 16 bytes of the TCP header, making it impossible to record the options from the segment header.
records_in
records_in FILENAME
Estimates the number to records in FILENAME and returns the "expected" number of records in the trace, which must an integer. If not an integer, records_in() returns false.
verbose
verbose
As you might expect, this function sets the verbosity level of the module. By default Net::Traces::TSH
remains "silent". Call verbose() to see trace processing progress indicators on standard error.
write_trace_summary
write_trace_summary FILENAME
write_trace_summary
Writes the contents of %Trace_Summary to FILENAME in comma separated values (CSV) format, a platform independent text format, excellent for storing tabular data. CSV is both human-readable and suitable for further analysis using Perl or direct import to a spreadsheet application. Although not required, it is recommended that FILENAME should have a .csv suffix.
If FILENAME is not specified, write_trace_summary() will create one for you by appending the suffix .csv to the filename of the trace being processed.
If you want FILENAME to contain meaningful data you should call write_trace_summary() after calling process_trace().
DEPENDENCIES
EXPORTS
None by default.
Exportable
date_of() get_IP_address() get_trace_summary_href() numerically() process_trace() records_in() verbose() write_trace_summary()
In addition, the following export tags are defined
- :traffic_analysis
-
verbose() process_trace() write_trace_summary()
- :trace_information
-
date_of() records_in()
Finally, all exportable functions can be imported with
use Net::Traces::TSH qw(:all);
VERSION
This is Net::Traces::TSH
version 0.05.
SEE ALSO
The NLANR MOAT Passive Measurement and Analysis (PMA) web site at http://pma.nlanr.net/PMA provides more details on the process of collecting packet traces. The site features a set of Perl programs you can download, including several converters from other packet trace formats to TSH.
TSH trace files can be downloaded from the NLANR/PMA trace repository at http://pma.nlanr.net/Traces . The site contains a variety of traces gathered from several monitoring points at university campuses and (Giga)PoPs connected to a variety of large and small networks.
DiffServ
If you are not familiar with Differentiated Services (DiffServ), good starting points are the following RFCs:
K. Nichols et al., Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers, RFC 2474. Available at http://www.ietf.org/rfc/rfc2474.txt
S. Blake et al., An Architecture for Differentiated Services, RFC 2475. Available at http://www.ietf.org/rfc/rfc2475.txt
See also RFC 2597 and RFC 2598.
ECN
If you are not familiar Explicit Congestion Notification (ECN) make sure to read
K. K. Ramakrishnan et al., The Addition of Explicit Congestion Notification (ECN) to IP, RFC 3168. Available at http://www.ietf.org/rfc/rfc3168.txt
AUTHOR
Kostas Pentikousis, kostas@cpan.org.
ACKNOWLEDGMENTS
Professor Hussein Badr provided invaluable guidance while crafting the main algorithms of this module.
Many thanks to Wall, Christiansen and Orwant for writing Programming Perl 3/e. It has been indispensable while developing this module.
COPYRIGHT AND LICENSE
Copyright 2003, 2004 by Kostas Pentikousis. All Rights Reserved.
This library is free software with ABSOLUTELY NO WARRANTY. You can redistribute it and/or modify it under the same terms as Perl itself.