NAME

Net::NfDump - Perl API for manipulating with nfdump files based on libnf.net library

SYNOPSIS

use Net::NfDump;

#
#
# Example 1: reading nfdump file(s)
# 

$flow = new Net::NfDump(
            InputFiles => [ 'nfdump_file1', 'nfdump_file2' ], 
            Filter => 'icmp and src net 10.0.0.0/8',
            Fields => 'proto, bytes' ); 

$flow->query();

while (my ($proto, $bytes) = $flow->fetchrow_array() )  {
    $h{$proto} += $bytes;
}
$flow->finish();

foreach ( keys %h ) {
    printf "%s %d\n", $_, $h{$_};
}

#
#
# Example 2: reading nfdump file(s) with aggregation and sorting
# 

$flow = new Net::NfDump(
            InputFiles => [ 'nfdump_file1', 'nfdump_file2' ], 
            Filter => 'icmp and src net 10.0.0.0/8',
            Fields => 'srcip/24/64, bytes', 
            Aggreg => 1, OrderBy => "bytes" ); 

$flow->query();

while (my ($ip, $bytes) = $flow->fetchrow_array() )  {
    printf "%s %d\n", $ip, $bytes;
    $h{$proto} += $bytes;
}
$flow->finish();


#
#
# Example 3: creating and writing records to nfdump file
#

$flow = new Net::NfDump(
            OutputFile => 'output.nfcap',
            Fields => 'srcip,dstip' );

$flow->storerow_arrayref( [ txt2ip('147.229.3.10'), txt2ip('1.2.3.4') ] );

$flow->finish();


#
#
# Example 4: reading/writing (merging two input files) and swap
#            source and destination address if the destination port 
#            is 80/http (I know it doesn't make much sense).
#

$flow1 = new Net::NfDump( 
             InputFiles => [ 'nfdump_file1', 'nfdump_file2' ], 
             Fields => 'srcip, dstip, dstport' ); 

$flow2 = new Net::NfDump( 
             OutputFile => 'nfdump_file_out', 
             Fields => 'srcip, dstip, dstport' ); 

$flow1->query();
$flow2->create();

while (my $ref = $flow->fetchrow_arrayref() )  {

    if ( $ref->[2] == 80 ) { 
        ($ref->[0], $ref->[1]) = ($ref->[1], $ref->[0]);
    }

   $flow2->clonerow($flow1);
   $flow2->storerow_arrayref($ref);

}

$flow1->finish();
$flow2->finish();

DESCRIPTION

Nfdump http://nfdump.sourceforge.net/ is a very popular toolset for collecting, storing and processing NetFlow/SFlow/IPFIX data. One of the key tools is a command line utility bearing the same name as the whole toolset (nfdump). Although this utility can process data very fast, it is cumbersome for some applications.

This module implements basic operations and allows to read, create and write flow records on binary files produced with nfdump tool. The module tries to keep the same naming conventions for methods as are used in DBI modules/API, so developers who got used to work with such interface should remain familiar with the new one.

The module uses the original nfdump sources to implement necessary functions. This enables to keep the compatibility with the original nfdump quiet easily and to cope with future versions of the nfdump tool with a minimal effort.

The architecture is following:

       APPLICATION 
+------------------------+
|                        |  Implements all methods and functions 
| Net::NfDump API (perl) |  described in this document.
|                        |
+------------------------+
|                        |  The code converts internal nfdump 
| libnf - glue code (C)  |  structures into perl and back to C.
|                        |  See http://libnf.net for more information.
+------------------------+
|                        |  All original nfdump source files. There  
|   nfdump sources (C)   |  are no changes in these files. All  
|                        |  changes are placed into libnf code.
+------------------------+  
      NFDUMP FILES

We always try to update Net::NfDump te lastest version of nfdump available on https://github.com/phaag/nfdump. Support for NSEL code is enabled.

WARNING FOR VERSION >= 0.13

The files created by Net::NfDump version >= 0.13 can be read only with nfdump 1.6.12 and newer. For reading it supports all formats starting with nfdump 1.6.

METHODS, OPTIONS AND RELATED FUNCTIONS

Options

Options can be handled by various methods. The basic options can be handled by the constructor and then modified by methods such as $obj->query() or $obj->create().

The values after => indicate the default value for the item.

  • InputFiles => []

    List of files to read (arrayref).

  • Filter => 'any'

    Filter that is applied on input records. It uses nfdump/tcpdump syntax.

  • Fields => '*'

    List of fields to read or to update. Any supported field can be used here. See the chapter "Supported Fields" for the full list. Special field * can be used to define all fields.

  • Aggreg => 0

    Create aggregated result. When the method ->query() is called the library loads data into memory structure and perform aggregation according the Fields attribute.

  • OrderBy => '<none>'

    Sort the final result according the field specified. It can by used only for aggregated results.

  • TimeWindowStart, TimeWindowEnd => 0

    Filter flows that start or end in the specific time window. The options use unix timestamp values or 0 if the filter should not be applied.

  • OutputFile => undef

    Output file for storerow_* methods. Default: undef

  • Compressed => 1

    Flag indicating whether the output files should be compressed or not.

  • Anonymized => 0

    Flag indicating that output file contains anonymized data.

  • Ident => ''

    String identificator of files. The value is stored in the file header.

  • CompatMode => 0

    Enable nfdump compatibility features. Some features are implemented differently comparing to original nfdump. Currently thi option enables only LNF_OPT_COMP_STATSCMP for aggregated statistics computation.

Constructor, status information methods

  • $obj = new Net::NfDump( %opts )

    my $obj = new Net::NfDump( InputFiles => [ 'file1']  );

    The constructor. It defines the way the parameter options can be specified.

  • $ref = $obj->info()

    my $i = $obj->info();
    print Dumper($i);

    informs about the current state of processing input files. It returns information about already processed files, blocks and records. The information may be useful for estimating the time of processing the whole dataset. Hashref returns following items:

    total_files           - total number of files to process
    elapsed_time          - elapsed time 
    remaining_time        - estimated remaining time to process all records
    percent               - estimated percentage of processed records
    
    processed_files       - total number of processed files
    processed_records     - total number of processed records
    processed_blocks      - total number of processed blocks
    processed_bytes       - total number of processed bytes 
                            number of bytes read from file 
                            system after uncompressing 
    
    current_filename      - the name of the file currently processed
    current_total_blocks  - the number of blocks in the currently 
                            processed file 
    current_processed_blocks -  the number of processed blocks in the 
                            currently processed file
  • $obj->finish()

    $obj->finish();

    closes all open file handles. It is necessary to call the method especially when a new file is created. The method flushes the file records which remained in the memory buffer and updates file statistics in the header. Without calling this method the output file might be corrupted.

Methods for reading data

  • $obj->query( %opts )

    $obj->query( Filter => 'src host 10.10.10.1' );

    This method has to be applied before any of the fetchrow_* methods is used. Any option described before can be used as a parameter of the method.

    After executing query command it possible to access $flow->{NUM_OF_FIELDS} and $flow->{NAME} variable to get returnd number of fields and field names. Here is an exmaple of code to acces field names:

    foreach $colno (0..$flow->{NUM_OF_FIELDS}-1) {
        print $flow->{NAME}->[$colno]."\t";
    }
  • $ref = $obj->fetchrow_arrayref()

    while (my $ref = $obj->fetchrow_arrayref() ) {
        print Dumper($ref);
    }

    This method has to be used after the query method. The method $obj->query() is called automatically if it has not been called before.

    It returns array reference with the record and skips to next record. It returns "true" if there are more records to read or "undef" if the end of a record set has been reached.

  • @array = $obj->fetchrow_array()

    while ( @array = $obj->fetchrow_arrayref() ) { 
      print Dumper(\@array);
    }

    It has the same function as fetchrow_arrayref; however, it returns items in array instead.

  • $ref = $obj->fetchrow_hashref()

    while ( $ref = $obj->fetchrow_hashref() ) {
       print Dumper($ref);
    }

    The same case as fetchrow_arrayref; however, the items are returned in the hash reference as the key => vallue tuples.

    NOTE: This method can be very ineffective in some cases, please, see PERFORMANCE section.

Methods for writing data

  • $obj->create( %opts )

    $obj->create( OutputFile => 'output.nfcapd' );

    This method creates a new nfdump file and has to be applied before any of $obj->storerow_* method is called.

  • $obj->storerow_arrayref( [ @array ] )

    $obj->storerow_arrayref( [ $srcip, $dstip ] );

    The method inserts data defined in arrayref to the file opened by the method $obj->create(). The number of fields and their order have to follow the order defined in the Fields option handled during $obj->new() or $obj->create() method.

  • $obj->storerow_array( @array )

    $obj->storerow_array( $srcip, $dstip );

    The same case as storerow_arrayref; however, the items are handled as a single array.

  • $obj->storerow_hashref ( \%hash )

    $obj->storerow_hashref( { 'srcip' =>  $srcip, 'dstip' => $dstip } );

    It inserts the structure defined as hash reference into output file.

    NOTE: This method can be very ineffective in some cases, please, see PERFORMANCE section.

  • $obj->clonerow( $obj2 )

    $obj->clonerow( $obj2 );

    This method copies the full content of the row from the source object (instance). This method is useful for writing effective scripts. See above the PERFORMANCE chapter.

Extra conversion and support functions

The module also provides extra convertion functions which allow to convert binnary format of IP address, MAC address and MPLS labels tag into text format and back.

Those functions are not exported by default, therefore it has to be either called with full module name or imported when the module is loaded. To import all support function :all a synonym may be used.

use Net::NfDump qw ':all';
  • $txt = ip2txt( $bin )

  • $bin = txt2ip( $txt )

    $ip = txt2ip('10.10.10.1');
    print ip2txt($ip);

    Converts both IPv4 and IPv6 addresses into text form and back. The standard inet_ntop/inet_pton functions can be used instead to provide the same results.

    Function txt2ip returns binnary format of IP address or "undef" if the conversion is not possible.

  • $txt = mac2txt( $bin )

  • $bin = txt2mac( $txt )

    $mac = txt2mac('aa:02:c2:2d:e0:12');
    print mac2txt($mac);

    It converts MAC address to xx:yy:xx:yy:xx:yy format and back. The function mac2txt accepts an address of any following format:

    aabbccddeeff
    aa:bb:cc:dd:ee:ff
    aa-bb-cc-dd-ee-ff
    aabb-ccdd-eeff

    It returns the binnary format of an address or "undef" if the conversion is not possible.

  • $txt = family2txt( $bin )

  • $bin = txt2family( $txt )

    $fam = txt2family('ipv6');
    print family2txt($fam);

    It converts internall address family (AF_INET, AF_INET6) to ipv4 or ipv6 string (or back).

    Function txt2family returns the binnary format of the family representation on the particular platform or "undef" if the conversion is not possible.

  • $txt = mpls2txt( $mpls )

  • $mpls = txt2mpls( $txt )

    $mpls = txt2mpls('1002-6-0 1003-6-0 1004-0-1');
    print mpls2txt($mpls);

    It converts label information into format Lbl-Exp-S and back.

    Where:

    Lbl - Value given to the MPLS label by the router. 
    Exp - Value of the experimental bit. 
    S   - Value of the end-of-stack bit: Set to 1 for the oldest 
          entry in the stack and to zero for all other entries. 
  • $ref = flow2txt( \%row )

  • $ref = txt2flow( \%row )

    The function flow2txt gets hash reference to the items returned by fetchrow_hashref and converts all items into text format readable for human. It applies functions ip2txt, mac2txt, mpl2txt to the items for which it makes sense. The function txt2flow does the exact opossite.

  • $ref = file_info( $file_name )

    $ref = file_info('file.nfcap');
    print Dumper($ref);

    It reads information from the nfdump file header and provides various attributes such as number of blocks, version, flags, statistics, etc. As the result, the following items are returned:

    version
    ident
    blocks
    catalog
    anonymized
    compressed
    sequence_failures
    
    first
    last
    
    flows, bytes, packets
    
    flows_tcp, flows_udp, flows_icmp, flows_other
    bytes_tcp, bytes_udp, bytes_icmp, bytes_other
    packets_tcp, packets_udp, packets_icmp, packets_other

SUPPORTED ITEMS

Up to date list of supported items is available on Net::NfDump::Fields

 Time items
 =====================
 first - Timestamp of the first packet seen (in miliseconds)
 last - Timestamp of the last packet seen (in miliseconds)
 received - Timestamp regarding when the packet was received by collector 

 Statistical items
 =====================
 bytes - The number of bytes 
 pkts - The number of packets 
 outbytes - The number of output bytes 
 outpkts - The number of output packets 
 flows - The number of flows (aggregated) 

 Layer 4 information
 =====================
 srcport - Source port 
 dstport - Destination port 
 tcpflags - TCP flags  

 Layer 3 information
 =====================
 srcip - Source IP address 
 dstip - Destination IP address 
 nexthop - IP next hop 
 srcmask - Source mask 
 dstmask - Destination mask 
 tos - Source type of service 
 dsttos - Destination type of service 
 srcas - Source AS number 
 dstas - Destination AS number 
 nextas - BGP Next AS 
 prevas - BGP Previous AS 
 bgpnexthop - BGP next hop 
 proto - IP protocol  

 Layer 2 information
 =====================
 srcvlan - Source vlan label 
 dstvlan - Destination vlan label 
 insrcmac - In source MAC address 
 outsrcmac - Out destination MAC address 
 indstmac - In destination MAC address 
 outdstmac - Out source MAC address 

 MPLS information
 =====================
 mpls - MPLS labels 

 Layer 1 information
 =====================
 inif - SNMP input interface number 
 outif - SNMP output interface number 
 dir - Flow directions ingress/egress 
 fwd - Forwarding status 

 Exporter information
 =====================
 router - Exporting router IP 
 systype - Type of exporter 
 sysid - Internal SysID of exporter 

 NSEL fields, see: http://www.cisco.com/en/US/docs/security/asa/asa81/netflow/netflow.html
 =====================
 eventtime - NSEL The time that the flow was created
 connid - NSEL An identifier of a unique flow for the device 
 icmpcode - NSEL ICMP code value 
 icmptype - NSEL ICMP type value 
 xevent - NSEL Extended event code
 xsrcip - NSEL Mapped source IPv4 address 
 xdstip - NSEL Mapped destination IPv4 address 
 xsrcport - NSEL Mapped source port 
 xdstport - NSEL Mapped destination port 
NSEL The input ACL that permitted or denied the flow
 iacl - Hash value or ID of the ACL name
 iace - Hash value or ID of the ACL name 
 ixace - Hash value or ID of an extended ACE configuration 
NSEL The output ACL that permitted or denied a flow  
 eacl - Hash value or ID of the ACL name
 eace - Hash value or ID of the ACL name
 exace - Hash value or ID of an extended ACE configuration
 username - NSEL username

 NEL (NetFlow Event Logging) fields
 =====================
 ingressvrfid - NEL NAT ingress vrf id 
 eventflag -  NAT event flag (always set to 1 by nfdump)
 egressvrfid -  NAT egress VRF ID

 NEL Port Block Allocation (added 2014-04-19)
 =====================
 blockstart -  NAT pool block start
 blockend -  NAT pool block end 
 blockstep -  NAT pool block step
 blocksize -  NAT pool block size

 Extra/special fields
 =====================
 cl - nprobe latency client_nw_delay_usec 
 sl - nprobe latency server_nw_delay_usec
 al - nprobe latency appl_latency_usec

PERFORMANCE

It is obvious that performance of the perl interface is lower in comparison to highly optimized nfdump utility. While nfdump is able to process up to 2 milion of records per second, the Net::NfDump is not able to process more than 1 milion. However, there are several rules to keep the code optimised:

  • Use $obj->fetchrow_arrayref() and $obj->storerow_arrayref() instead of *_array and *_hashref equivalents. Arrayref handles only the reference to the structure with data. Avoid using *_hashref functions, it can be 5-times slower.

  • Handle to the perl API only items which are necessary to be used in the code. It is always more effective to define in Fields => 'srcip,dstip,...' instead of in Fields => '*'.

  • Preference to using $obj->clonerow($obj2) method is highly recommended. This method copies data between two instances directly in the C code in the libnf layer.

    Following code:

    $obj1->exec( Fields => '*' );
    $obj2->create( Fields => '*' );
    
    while ( my $ref = $obj1->fetchrow_arrayref() ) {
        # do something with srcip 
        $obj2->storerow_arrayref($ref);
    }

    can be written in a more effective way (several times faster):

    $obj1->exec( Fields => 'srcip' );
    $obj2->create( Fields => 'srcip' );
    
    while ( my $ref = $obj1->fetchrow_arrayref() ) {
        # do something with srcip 
        $obj2->clonerow($obj1);
        $obj2->storerow_arrayref($ref);
    }

NOTE ABOUT 32BIT PLATFORMS

Nfdump primary uses 64 bit counters and other items to store single integer value. However, the native 64 bit support is not compiled in every perl. For those cases where only 32 integer values are supported, the Net::NfDump uses Math::Int64 module.

The build scripts detect the platform automatically and math::Int64 module is required only on platforms where an available perl does not support 64bit integer values.

EXAMPLES OF USE

There are several examples in the examples and bin directory.

nfasnupd - Is script for updating the information about AS numbers and country codes based on BGP and geolocation database. Every flow can be extended with src/dst AS number and alco can be extended with src/dst country code.

The nfasnupd periodically checks and downloads the BGP database which is available as part of libn.net project. After that it updates the AS (or country code) information in the nfdump file. It can be run as the extra command (-x option of nfcapd) to update information when the new file is available.

The information about src/dst country works in a similar way. It uses maxmind database and Geo::IP module. However, nfdump does not support any field to store such kind of information; the xsrcport and xdstport fields are used instead. The country code is converted into 16 bit information (8 bits for the first character of a country code and another 8 bits for the second one).

SEE ALSO

nfdump project - https://github.com/phaag/nfdump libnf C interface http://libnf.net/

AUTHOR

Tomas Podermanski, <tpoder@vut.cz>, Brno University of Technology NetX Networks a.s., <info@netx.as<gt>

COPYRIGHT AND LICENCE

Copyright (C) 2012 - 2019 by Brno University of Technology Copyright (C) 2020 by NetX Networks a.s.

This library is free software; you can redistribute it and modify it under the same terms as Perl itself.

If you are satisfied with using Net::NfDump, please, send us a postcard, preferably with a picture of your location / city to:

Brno University of Technology 
CVIS
Tomas Podermanski 
Antoninska 1
601 90 
Czech Republic 

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 1398:

=back without =over

Around line 1423:

=back without =over