The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Nmap::Parser - Nmap parser for xml scan data

SYNOPSIS

  use Nmap::Parser;

        #PARSING
  my $np = new Nmap::Parser;

  $nmap_exe = '/usr/bin/nmap';
  $np->parsescan($nmap_exe,'-sT -p1-1023', @ips);


  #or
  $np->parsefile('nmap_output.xml') #using filenames

        #GETTING SCAN INFORMATION

  print "Scan Information:\n";
  $si = $np->get_scaninfo();
  #Now I can get scan information by calling methods
  print
  'Number of services scanned: '.$si->num_of_services()."\n",
  'Start Time: '.$si->start_time()."\n",
  'Scan Types: ',(join ' ',$si->scan_types())."\n";

        #GETTING HOST INFORMATION

   print "Hosts scanned:\n";
   for my $host_obj ($np->get_host_objects()){
   print
  'Hostname  : '.$host_obj->hostname()."\n",
  'Address   : '.$host_obj->addr()."\n",
  'OS match  : '.$host_obj->os_match()."\n",
  'Open Ports: '.(join ',',$host_obj->tcp_ports('open'))."\n";
        #... you get the idea...
   }

  $p->clean(); #frees memory
  # ... do other stuff if you want ...

Note: You can either pass the $np object a filehandle (piping nmap output using the nmap '-oX -' option, or you can pass it a filename. You can get the information the standard way using methods, or you can do it using callbacks (see more of the doc).

DESCRIPTION

This is an stand-alone output parser for nmap XML reports. This uses the XML::Twig library which is fast and memory efficient. This module does not do a nmap scan (See Nmap::Scanner for that functionality). It either can parse a nmap xml file, or it can take a filehandle that is piped from a current nmap running scan using '-oX -' switch. This module was developed to speedup network security tool development when using nmap.

This module is meant to be a balance of easy of use and efficiency. (more ease of use). I have added filtering capabilities and use various options on the twig library in order to incrase parsing speed and save on memory usage. If you need more information from an nmap xml-output that is not available in the release, please send your request. (see below).

OVERVIEW

Using this module is very simple. (hopefully).

Set your Options

You first set any filters you want on the information you will parse. This is optional, but if you wish the parser to be more efficient, don't parse information you don't need. Other options (os_family) can be set also. (See Pre-Parse methods)

Example, if you only want to retain the information of the hosts that nmap found to be up (active), then set the filter:

 $np->parse_filters({only_active => 1});

Usually you won't have much information about hosts that are down from nmap anyways.

Run the parser

Parse the info. You use $np->parse() or $np->parsefile(), to parse the nmap xml information. This information is parsed and constructed internally.

Get the Scan Info

Use the $si = $np->get_scaninfo() to obtain the Nmap::Parser::ScanInfo object. Then you can call any of the ScanInfo methods on this object to retrieve the information. See Nmap::Parser::ScanInfo below.

Get the Host Info

Use the $np->get_host($addr) to obtain the Nmap::Parser::Host object of the current address. Using this object you can call any methods in the Nmap::Parser::Host object to retrieve the information that nmap obtained from this scan.

 $np->get_host($ip_addr);

You can use any of the other methods to filter or obtain different lists.

        #returns all ip addresses that were scanned
 $np->get_host_list()

        #returns all ip addresses that have osfamily = $os
 $np->filter_by_osfamily($os)
         #See get_os_list() and set_os_list()
         #etc. (see other methods)

        #returns all host objects from the information parsed.
        #All are Nmap::Parser::Host objects
 $np->get_host_objects()
Clean up

This is semi-optional. When files are not that long, this is optional. If you are in a situation with memory constraints and are dealing with large nmap xml-output files, this little effort helps. After you are done with everything, you should do a $np->clean() to free up the memory used by maintaining the scan and hosts information from the scan. A much more efficient way to do is, once you are done using a host object, delete it.

                #Getting all IP addresses parsed
 for my $host ($np->get_host_list())
        {       #Getting the host object for that address
        my $h = $np->get_host($host);
                #Calling methods on that object
        print "Addr: $host  OS: ".$h->os_match()."\n";
        $np->del_host($host); #frees memory
        }

        #Or when you are done with everything use $np->clean()
Or you could skip the $np->del_host(), and after you are done, perform a
$np->clean() which resets all the internal trees. Of course there are much
better ways to clean-up (using perl idioms).

METHODS

Pre-Parsing Methods

new()

Creates a new Nmap::Parser object with default handlers and default osfamily list. In this document the current Nmap::Parser object will be referred as $np.

 my $np = new Nmap::Parser; #NPX = Nmap Parser XML for those curious
set_osfamily_list($hashref)

Decides what is the osfamily name of the given system.

Takes in a hash refernce that referes to pairs of osfamily names to their keyword list. Shown here is the default. Calling this method will overwrite the whole list, not append to it. Use get_osfamily_list() first to get the current listing.

  $np->set_osfamily_list({
        linux   => [qw(linux mandrake redhat slackware)],
        mac     => [qw(mac osx)],
        solaris => [qw(solaris sparc sun)],
        switch  => [qw(ethernet cisco netscout router switch bridge)],
        unix    => [qw(unix hp-ux hpux bsd immunix aix)],
        wap     => [qw(wireless wap)],
        win     => [qw(win microsoft workgroup)]
            });

example: osfamily_name = solaris if the os string being matched matches (solaris, sparc or sunos) keywords

The reason for having this seprately that relying on the 'osclass' tag in the xml output is that the 'osclass' tag is not generated all the time. Usually new versions of nmap will generate the 'osclass' tags. These will be available through the Nmap::Parser::Host methods. (See below).

get_osfamily_list()

Returns a hashre containing the current osfaimly names (keys) and an arrayref pointing to the list of corresponding keywords (values). See set_osfamily_list() for an example.

parse_filters($hashref)

This function takes a hash reference that will set the corresponding filters when parsing the xml information. All filter names passed will be treated as case-insensitive. NOTE: This version of the parser will ignore the 'addport' tag in the xml file. If you feel the need for this tag. Send your feedback

 $np->parse_filters({
        osfamily        => 1, #same as any variation. Ex: osfaMiLy
        only_active     => 0,  #same here
        portinfo        => 1,
                });
EXTRAPORTS

If set to true, (the default), it will parse the extraports tag.

ONLY_ACTIVE

If set to true, it will ignore hosts that nmap found to be in state 'down'. If set to perl-wise false, it will parse all the hosts. This is the default. Note that if you do not place this filter, it will parse and store (in memory) hosts that do not have much information. So calling a Nmap::Parser::Host method on one of these hosts that were 'down', will return undef.

OSFAMILY

If set to true, (the default), it will match the OS guessed by nmap with a osfamily name that is given in the OS list. See set_osfamily_list(). If false, it will disable this matching (a bit of speed up in parsing).

OSINFO

If set to true (default) it will parse any OS information found (osclass and osmatch tags). Otherwise, it will ignore these tags (faster parsing).

PORTINFO

If set to true, parses the port information. (You usually want this enabled). This is the default.

SCANINFO

If set to true, parses the scan information. This includes the 'scaninfo', 'nmaprun' and 'finished' tags. This is set to true by default. If you don't care about the scan information of the file, then turn this off to enhance speed and memory usage.

SEQUENCES

If set to true, parses the tcpsequence, ipidsequence and tcptssequence information. This is the default.

UPTIME

If set to true, parses the uptime information (lastboot, uptime-seconds..etc). This is the default.

reset_filters()

Resets the value of the filters to the default values:

 osfamily       => 1
 scaninfo       => 1
 only_active    => 0
 sequences      => 1
 portinfo       => 1
 scaninfo       => 1
 uptime         => 1
 extraports     => 1
 osinfo         => 1
register_host_callback

Sets a callback function, (which will be called) whenever a host is found. The callback defined will receive as arguments the current Nmap::Parser::Host that was just parsed. After the callback returns (back to Nmap::Parser to keep on parsing other hosts), that current host will be deleted (so you don't have to delete it yourself). This saves a lot of memory since after you perform the actions you wish to perform on the Nmap::Parser::Host object you currently have, it gets deleted from the tree.

 $np->register_host_callback(\&host_handler);

 sub host_handler {
 my $host_obj = shift; #an instance of Nmap::Parser::Host (for current)

 ... do stuff with $host_obj ... (see Nmap::Parser::Host doc)

 return; # $host_obj will be deleted (similar to del_host()) method

 }
reset_host_callback

Resets the host callback function, and does normal parsing.

Parse Methods

parse($source [, opt => opt_value [...]])

This method is inherited from XML::Parser. The $source parameter should either be a string containing the whole XML document, or it should be an open IO::Handle (filehandle). Constructor options to XML::Parser::Expat given as keyword-value pairs may follow the $source parameter. These override, for this call, any options or attributes passed through from the XML::Parser instance.

A die call is thrown if a parse error occurs. This method wraps the parsing in an "eval" block. $@ contains the error message on failure. NOTE: that the parsing still stops as soon as an error is detected, there is no way to keep going after an error.

If you get an error or your program dies due to parsing, please check that the xml information is compliant. If you are using parsescan() or an open filehandle , make sure that the nmap scan that you are performing is successful in returning xml information. (Sometimes using loopback addresses causes nmap to fail).

parsescan($nmap_exe, $args , @ips) Experimental

This method takes as arguments the path to the nmap executable (it could just be 'nmap' too), nmap command line options and a list of IP addresses. It then runs an nmap scan that is piped directly into the Nmap::Parser parser. This enables you to perform an nmap scan against a series of hosts and automatically have the Nmap::Parser module parse it.

 #Example:
 my @ips = qw(127.0.0.1 10.1.1.1);
 $nmap_exe = '/usr/bin/nmap';
 $p->parsescan($nmap_exe,'-sT -p1-1023', @ips);
 #   ... then do stuff with Nmap::Parser object

 my $host_obj = $p->get_host("127.0.0.1");
 #   ... and so on and so forth ...

Note: You cannot have one of the nmap options to be '-oX', '-oN' or 'oG'. Your program will die if you try and pass any of these options because it decides the type of output nmap will generate. The IP addresses can be nmap-formatted addresses (see nmap(1)

If you get an error or your program dies due to parsing, please check that the xml information is compliant. If you are using parsescan() or an open filehandle , make sure that the nmap scan that you are performing is successful in returning xml information. (Sometimes using loopback addresses causes nmap to fail).

parsefile($filename [, opt => opt_value [...]])

This method is inherited from XML::Parser. This is the same as parse() except that it takes in a filename that it will OPEN and parse. The file is closed no matter how parsefile() returns.

A die call is thrown if a parse error occurs. This method wraps the parsing in an "eval" block. $@ contains the error message on failure. NOTE: that the parsing still stops as soon as an error is detected, there is no way to keep going after an error.

If you get an error or your program dies due to parsing, please check that the xml information is compliant.

clean()

Frees up memory by cleaning the current tree hashes and purging the current information in the XML::Twig object. Returns the Nmap::Parser object.

Post-Parse Methods

get_host_list([$status])

Returns all the ip addresses that were run in the nmap scan. $status is optional and can be either 'up' or 'down'. If $status is given, then only IP addresses that have that corresponding state will be returned. Example: setting $status = 'up', then will return all IP addresses that were found to be up. (network talk for active)

get_host($ip_addr)

Returns the complete host object of the corresponding IP address.

del_host($ip_addr)

Deletes the corresponding host object from the main tree. (Frees up memory of unwanted host structures).

get_host_objects()

Returns all the host objects of all the IP addresses that nmap had run against. See Nmap::Parser::Host.

filter_by_osfamily(@osfamily_names)

This returns all the IP addresses that have match any of the keywords in @osfamily_names that is set in their osfamily_names field. See os_list() for example on osfamily_name. This makes it easier to sift through the lists of IP if you are trying to split up IP addresses depending on platform (window and unix machines for example).

filter_by_status($status)

This returns an array of hosts addresses that are in the $status state. $status can be either 'up' or 'down'. Default is 'up'.

get_scaninfo()

Returns the current Nmap::Parser::ScanInfo. Methods can be called on this object to retrieve information about the parsed scan. See Nmap::Parser::ScanInfo below.

sort_ips(@ips)

Given an array of IP addresses, it returns an array of IP addresses which is correctly sorted according to the network address. An example would be that 10.99.99.99 would come before 10.100.99.99. It takes each quad from an IP address and compares it to corresponding quad number on the other IP address. (So 99 would come before 100).

Methods can be called on this object to retrieve information about the parsed scan. See Nmap::Parser::ScanInfo below.

Nmap::Parser::ScanInfo

The scaninfo object. This package contains methods to easily access all the parameters and values of the Nmap scan information ran by the currently parsed xml file or filehandle.

 $si = $np->get_scaninfo();
 print  'Nmap Version: '.$si->nmap_version()."\n",
        'Num of Scan Types: '.(join ',', $si->scan_types() )."\n",
        'Total time: '.($si->finish_time() - $si->start_time()).' seconds';
        #... you get the idea...
num_of_services([$scan_type]);

If given a corresponding scan type, it returns the number of services that was scan by nmap for that scan type. If $scan_type is omitted, then num_of_services() returns the total number of services scan by all scan_types.

start_time()

Returns the start time of the nmap scan.

finish_time()

Returns the finish time of the nmap scan.

nmap_version()

Returns the version of nmap that ran.

xml_version()

Returns the xml-output version of nmap-xml information.

args()

Returns the command line parameters that were run with nmap

scan_types()

Returns an array containing the names of the scan types that were selected.

proto_of_scan_type($scan_type)

Returns the protocol of the specific scan type.

Nmap::Parser::Host

The host object. This package contains methods to easily access the information of a host that was scanned.

  $host_obj = Nmap::Parser->get_host($ip_addr);
   #Now I can get information about this host whose ip = $ip_addr
   print
  'Hostname: '.$host_obj->hostnames(1),"\n",
  'Address:  '.$host_obj->addr()."\n",
  'OS match: '.$host_obj->os_match()."\n",
  'Last Reboot: '.($host_obj->uptime_lastboot,"\n";
  #... you get the idea...

If you would like for me to add more advanced information (such as TCP Sequences), let me know.

status()

Returns the status of the host system. Either 'up' or 'down'

addr()

Returns the IP address of the system

addrtype()

Returns the address type of the IP address returned by addr(). Ex. 'ipv4'

hostname()

Returns the first hostname found of the current host object. This is a short-cut to using hostnames(1).

 $host_obj->hostname() eq $host_obj->hostnames(1) #Always true
hostnames($number)

If $number is omitted (or false), returns an array containing all of the host names. If $number is given, then returns the host name in that particular index. The index starts at 1.

 $host_obj->hostnames();  #returns an array containing the hostnames found
 $host_obj->hostnames(1); #returns the 1st hostname found
 $host_obj->hostnames(4); #returns the 4th. (you get the idea..)
extraports_state()

Returns the state of the extra ports found by nmap. (The 'state' attribute in the extraports tag).

extraports_count()

Returns the number of extra ports that nmap found to be in a given state. (The 'count' attribute in the extraports tag).

tcp_ports([$state]), udp_ports([[$state]])

Returns an sorted array containing the tcp/udp ports that were scanned. If the optional 'state' paramter is passed, it will only return the ports that nmap found to be in that state.The value of $state can either be 'closed', 'filtered' or 'open'. NOTE: If you used a parsing filter such as setting portinfo = 0, then all ports will return undef.>

 my @ports = $host_obj->tcp_ports; #all ports
 my $port = pop @ports;

 if($host_obj->tcp_port_state($port) ne 'closed'){

         $host_obj->tcp_service_name($port);  #ex: rpcbind
         $host_obj->tcp_service_proto($port); #ex: rpc (may not be defined)
         $host_obj->tcp_service_rpcnum($port);#ex: 100000 (only if proto is rpc)
 }

Again, you could filter what ports you wish to receive:

 #it can be either 'filtered', 'closed' or 'open'

 my @filtered_ports = $host_obj->tcp_ports('filtered');
 my @open_ports = $host_obj->tcp_ports('open');
tcp_ports_count(), udp_ports_count()

Returns the number of tcp/udp ports found. This is a short-cut function (but more efficient) to:

 scalar @{[$host->tcp_ports]} == $host->tcp_ports_count;
tcp_port_state($port), udp_port_state($port)

Returns the state of the given tcp/udp port.

tcp_service_extrainfo($port), udp_service_extrainfo($port)

Returns any extra information about the running service. This information is usually available when the scan performed was version scan (-sV).

NOTE This attribute is only available in new versions of nmap (3.40+).

tcp_service_name($port), udp_service_name($port)

Returns the name of the service running on the given tcp/udp $port. (if any)

tcp_service_extrainfo($port), udp_service_extrainfo($port)

Returns the service product information from the nmap service information. This information is available when the scan performed was version scan (-sV).

NOTE This attribute is only available in new versions of nmap (3.40+).

tcp_service_proto($port), udp_service_proto($port)

Returns the protocol type of the given port. This can be tcp, udp, or rpc as given by nmap.

tcp_service_rpcnum($port), udp_service_rpcnum($port)

Returns the rpc number of the service on the given port. This value only exists if the protocol on the given port was found to be RPC by nmap.

tcp_service_version($port), udp_service_version($port)

Returns the version content of the service running on the given tcp/udp $port. (if any)

NOTE This attribute is only available in new versions of nmap (3.40+).

os_match

Same as os_matches(), except this is a short-cut function for obtaining the first OS guess provided by nmap. The statements are equivalent:

 $host_obj->os_matches(1) eq $host_obj->os_match() #true
os_matches([$number])

If $number is omitted, returns an array of possible matching os names. If $number is given, then returns that index entry of possible os names. The index starts at 1.

 $host_obj->os_matches();  #returns an array containing the os names found
 $host_obj->os_matches(1); #returns the 1st os name found
 $host_obj->os_matches(5); #returns the 5th. (you get the idea...)
os_port_used($state)

Returns the port number that was used in determining the OS of the system. If $state is set to 'open', then the port id that was used in state open is returned. If $state is set to 'closed', then the port id that was used in state closed is returned. (no kidding...). Default, the open port number is returned.

os_family()

Returns the osfamily_name(s) that was matched to the given host. It is comma delimited. This osfamily value is determined by the list given in the *_osfamily_list() functions. (Example of value: 'solaris,unix')

Note: see set_osfamily_list()

os_class([$number])

Returns the os_family, os_generation and os_type that was guessed by nmap. The os_class tag does not always appear in all nmap OS fingerprinting scans. This appears in newer nmap versions. You should check to see if there are values to this. If you want a customized (and sure) way of determining an os_family value use the *_osfamily_list() functions to set them. These will determine what os_family value to give depending on the osmatches recovered from the scan.

There can be more than one os_class (different kernels of Linux for example). In order to access these extra os_class information, you can pass an index number to the function. If no number is given, the total number of osclass tags parsed will be returned. The index starts at 1.

  #returns the first set
 $num_of_os_classes = $host_obj->os_class();

  #returns the first set (same as passing no arguments)
 ($os_family,$os_gen,$os_vendor,$os_type) = $host_obj->os_class(1);

  #returns os_gen value only. Example: '2.4.x' if is a Linux 2.4.x kernel.
  $os_gen                      = ($host_obj->os_class())[2];# os_gen only

You can play with perl to get the values you want easily.

Note: This tag is usually available in new versions of nmap. You can define your own os_family customizing the os_family lists using the Nmap::Parser functions: set_osfamily_list() and get_osfamily_list().

os_osfamily([$number])

Given a index number, it returns the osfamily value of that given osclass information. The index starts at 1.

os_gen([$number])

Given a index number, it returns the os-generation value of that given osclass information. The index starts at 1.

os_vendor([$number])

Given a index number, it returns the os vendor value of that given osclass information. The index starts at 1.

os_type([$number])

Given a index number, it returns the os type value of that given osclass information. Usually this is nmap's guess on how the machine is used for. Example: 'general purpose', 'web proxy', 'firewall'. The index starts at 1.

tcpsequence_class()

Returns the tcpsequence class information.

tcpsequence_values()

Returns the tcpsequence values information.

tcpsequence_values()

Returns the tcpsequence index information.

ipidsequence_class()

Returns the ipidsequence class information

ipidsequence_values()

Returns the ipidsequence values information

tcptssequence_class()

Returns the tcptssequence class information.

tcptssequence_values()

Returns the tcptssequence values information.

uptime_seconds()

Returns the number of seconds the host has been up (since boot).

uptime_lastboot()

Returns the time and date the given host was last rebooted.

EXAMPLES

These are a couple of examples to help you create custom security audit tools using some of the features of the Nmap::Parser module.

Using ParseScan

You can run an nmap scan and have the parser parse the information automagically. The only thing is that you cannot use '-oX', '-oN', or '-oG' as one of your arguments for the nmap command line options passed to parsescan().

 use Nmap::Parser;

 my $np = new Nmap::Parser;
 #this is a simple example (no input checking done)

 my @hosts = @ARGV; #Get hosts from stdin

 #runs the nmap command with hosts and parses it at the same time
 #do not use -oX, -oN or -oG as one of your arguments. It is not allowed here.
 $np->parsescan('nmap','-sS O -p 1-1023',@hosts);

 print "Active Hosts Scanned:\n";
 for my $ip ($np->get_host_list('up')){print $ip."\n";}

 #... do more stuff with $np ...

 __END__

Using Register-Callback

This is probably the easiest way to write a script with using Nmap::Parser, if you don't need the general scan information. During the parsing process, the parser will obtain information of every host from the xml scan output. The callback function is called after completely parsing a single host. When the callback returns (or you finish doing what you need to do for that host), the parser will delete all information of the host it had sent to the callback. This callback function is called for every host that the parser encounters.

 use Nmap::Parser;
 my $np = new Nmap::Parser;

 #NOTE: the callback function must be setup before parsing beings
 $np->register_host_callback( \&my_function_here );

 #parsing will begin
 $np->parsefile('scanfile.xml');

 sub my_function_here {
         #you will receive a Nmap::Parser::Host object for the current host
         #that has just been finished scanned (or parsing)

     my $host = shift;
     print 'Scanned IP: '.$host->addr()."\n";
         # ... do more stuff with $host ...

         #when this function returns, the parser will delete the host
         #information that it was holding (referring to $host).

     return;

 }

BUG REPORTS AND SUPPORT

Please submit any bugs to: http://sourceforge.net/tracker/?group_id=97509&atid=618345

SEE ALSO

 nmap, L<XML::Twig>

The Nmap::Parser page can be found at: http://npx.sourceforge.net/. It contains the latest developments on the module. The nmap security scanner homepage can be found at: http://www.insecure.org/nmap/. This project is also on sourceforge.net: http://sourceforge.net/projects/npx/

ACKNOWLEDGEMENTS

Thanks to everyone who have provided feedback to improve and enhance this module.

Special Thanks to:

Max Schubert, Sebastian Wolfgarten

AUTHOR

Anthony G Persaud <ironstar@iastate.edu>

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

http://www.opensource.org/licenses/gpl-license.php