NAME

tstregex - A Hybrid Regex Diagnostic Tool (single file Library module and command tool) shows the longest Regular Expression match / highlight the rejected part

Example:

$ perl lib/tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'

abc123

abc12a (^[a-z]*\d{3}$)

SYNOPSIS

# Example of command and its terminal output :

Example:

$ perl lib/tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'

abc123

abc12a (^[a-z]*\d{3}$)

# The bold parts above highlight the rejected string and regex token.

OPTIONS (CLI)

-h --help

show that help..

-v --verbose

shows key info on (un)matching..

-d --diag

Triggers the Enriched Diagnostic View. It displays: - The string with the failing part highlighted. - The exact token in the regex that caused the break. - A visual pointer (^--- HERE) aligned with the regex syntax. - Execution time (useful for spotting ReDoS/Exponential backtracking).

-a --assert

Misc: performs a huge test suite various a large collection of regexp tests with tstregex..

Perl Module SYNOPSIS

use tstregex;
my $ctx = tstregex_init_desc('/^\d{3}/');
tstregex($ctx, '12a');
if (!tstregex_is_full_match($res))
    {
    my $token = tstregex_get_fail_token($res);
    my $pos   = tstregex_get_match_len($res);
    print "Failure on token '$token' at column $pos\n";
    }

API

tstregex_init_desc($raw_re)

Pre-parses the regex, handles delimiters (m!!, //, etc.), extracts modifiers (i, s, m, x), and prepares the nibbling steps. Returns a context hash.

tstregex($ctx, $string)

Executes the diagnostic. Updates the context.

tstregex_is_full_match

Returns match status of input string (BOOL 0 OR 1)

tstregex_get_match_portion

Returns the matching portion in case of full match (might be smaller than input string, depending on anchors..)

tstregex_get_match_len

Returns the matching substring length

tstregex_get_fail_token

Returns the failing token in the regexp

tstregex_get_re_clean

Returns the matching regexp subpart

tstregex_get_re_raw

Returns the internal representation of the regexp

tstregex_get_prefix_offset

Returns the offset of the original regexp in the raw regexp

DESCRIPTION

tstregex is designed to solve the "Black Box" problem of Regular Expressions. When a complex regex fails, Perl usually just says "No Match". This tool identifies exactly where and why it failed by finding the longest possible partial match.

EXAMPLE

$ perl lib/tstregex.pm '/^[a-z]*\d{3}$/' 'abc123' 'abc12a'
abc123
abcB<12a> (B<^[a-z]*>\d{3}$)

The tool highlights the part of the string where the match failed.

The "Nibbling" Engine

The diagnostic logic uses a "Nibbling" (grignotage) strategy:

1. Decomposition

The engine breaks down your regex into a hierarchy of valid sub-patterns (lexical groups, atoms, and quantifiers) from longest to shortest.

It iteratively tests these sub-patterns against the input string. It's not just checking if the start matches, but what is the maximum sequence of instructions the engine could follow before hitting a wall.

3. Failure Point Identification

Once the longest matching sub-pattern is found, the tool identifies the very next token in your regex syntax. This is your "Point of Failure".

AUTHOR

Olivier Delouya - 2026

LICENSE

Artistic Version 2