NAME

Text::XmlMatch - Pattern-matching and grouping via XML configuration file

SYNOPSIS

use Text::XmlMatch;
my $matcher = Text::XmlMatch->new('ConfigurationFile.xml');

#Find group, results returned as hash reference
my $results = $matcher->findMatch('09460-3640-2-s-x.csc.na.testdomain.com');
foreach (keys %$results) {
  print "Group Name\t--   Group Type \n";
  print "$_\t--   $$results{$_} \n";
}

Sample XML Configuration "ConfigurationFile.xml":

<config>
 <!-- Find FQDN's that match a particular datacenter -->
 <pattern name="DATACENTER-ndc">
   <inclusion>^corp.*\.net</inclusion>
   <tag>datacenter</tag>
 </pattern>

 <!-- Find devices that match a particular market -->
 <pattern name="Market CSC">
   <inclusion>\S+-\S+-\d+-\w-\w\.csc</inclusion>
   <tag>market</tag>
 </pattern>
</config>

DESCRIPTION

This module provides matching/grouping functions via a configuration file specified in XML format. By specifying inclusion criteria and pattern names, the user may pass strings to the created object to perform sophisticated pattern matching/grouping. In addition, optional exclusion criteria may be specified as well as an optional descriptor to further refine the behavior of the searching and the returned search results.

This grouping and classification function is required frequently in network management systems, where hundreds and often thousands of items need to be grouped according to a variety of criteria. Such grouping can be discreet or overlapping depending on the configuration specified. In complex management systems where multiple platforms are required, this module can ease administrative burdens by allowing multiple systems to share a common configuration file. Each system can then be configured to only respond to items matching a specific pattern name.

In addition, this module allows for dynamic group name creation via support of back-references. By following the convention of Perl's memory variables, grouping can be accomplished such that a pattern name depends on the content of what is being matched. All of this behavior is determined by way of a simple XML configuration file.

METHODS

findMatch(string)

Using the XML configuration file that was specified during the Text::XmlMatch object creation, it will return all matches in the form of a hash reference. The keys of this hash are the pattern names that correspond to matches in the XML configuration file. The values of the hash contain the tag information (if any was specified, otherwise it simply contains the value '0').

A "match" for a pattern name in this module implies the following are both true:

  • The contents of any lines wrapped in <inclusion> tags regex match the supplied string. Multiple <inclusion> tags per pattern are allowed.

  • Any lines wrapped in <exclusion> tags do not regex match the supplied string. Multiple <exclusion> tags per pattern are allowed.

listGroups()

This simply returns a an array or reference to an array containing a list of all the pattern names that were derived from the XML configuration file. The caller's context determines whether an array or reference is returned.

XML Configuration file

The format for the XML configuration file is as follows:

<config>
  <pattern name="group_name_goes_here">
    <inclusion>regular_expression_#1_here</inclusion>
    <inclusion>regular_expression_#2_here</inclusion>
    <exclusion>optional_regular_expression_#1_here</exclusion>
    <exclusion>optional_regular_expression_#2_here</exclusion>
    <tag>optional_descriptor</tag>
  </pattern>
</config>
config tag

A mandatory tag that specifies the start of the XML configuration file. Any valid configuration file for this module must open with this tag.

pattern name tag

A mandatory tag that specifies the name for the group that is to be established. Any submitted strings that match the criteria specified for this pattern will return this name as described when findMatch() is called.

Note that this opening tag must include at least one inclusion tag (see below). The following keywords are reserved and must not be used as a pattern name: 'inclusion', 'tag', 'name'.

inclusion tag

A tag that contains a regular expression. At least one of these must be defined within the pattern name tags, but multiple regular expressions can be specified by including multiple <inclusion> tags. Note, if multiple inclusion tags are specified, they are treated as a logical OR. If any string submitted via findMatch() matches any one of the regular expressions identified by an inclusion cause, then that string is considered a match for the group if and only if the string does not match a regular expression identified within an exclusion tag (see below).

exclusion tag

This is an optional tag that contains a regular expression. Multiple regular expressions can be specified by including multiple exclusion tags. Note, if multiple exclusion tags are specified, they are treated as a logical OR. For a given pattern name, a string that matches any regular expression contained within an exclusion set will cause pattern name to not be returned by findMatch().

tag tag

Other than having an unfortunate choice of name, this provides an optional descriptor for each pattern section. As an example, if one wants to establish pattern name "types," the tag section could be set so that all matches can be further categorized later. The user would then have the option of using the results of the individual patterns along with more sophisticated grouping based on the returned tags.

XML Configuration with Back-references

Using up to five memory variables: $1, $2, $3, $4, and $5, the XML configuration can be configured to provide dynamic pattern name creation. By using standard memory parentheses in the <inclusion> tags, the memory variables may be directly referenced in pattern name. For example, take the following configuration:

<!-- Standard COID based facility name -->
 <pattern name="COID-$1">
   <inclusion>^(\d{5})(\w{2})?-\w{4}-\d+-\w-\w\.\w{3}</inclusion>
   <tag>facility</tag>
 </pattern>

The results of the match are captured in the memory parentheses (\d{5}) will be present in the final group name of "COID-$1," where $1 will be replaced by the results of the match.

LIMITATIONS

Configuration files that contain duplicated pattern names will cause undesired behavior. Instead of specifying a pattern name more than once, consider using multiple <inclusion> tags under a single pattern name, or create multiple Text::XmlMatch objects pointing to different configuration files.

PREREQUISITES

This module requires XML::Simple 2.14.

SEE ALSO

The extras directory contains sample XML configuration files that are also used to provide the configurations for the test scripts.

AUTHOR

Jason A. Lee <leeja@cpan.org>

COPYRIGHT

Text::XmlMatch version 1.0006

Copyright 2007, Jason Lee
  All rights reserved.