NAME

Mock::Data::Regex - Generator that uses a Regex as a template to generate strings

SYNOPSIS

# Automatically used when you give a Regexp ref to Mock::Data
my $mock= Mock::Data->new(generators => { word => qr/\w+/ });

# or use stand-alone
my $email= Mock::Data::Regex->new( qr/ [-a-z]+\d{0,2} @ [a-z]{2,20} \. (com|net|org) /xa );
say $email->generate;  # o25@nskwprtpqlqbeg.org

# define attributes, or override them on demand
say Mock::Data::Regex->new($regex)->generate($mock, { max_repetition => 50 });
say Mock::Data::Regex->new(regex => $regex, max_repetition => 50)->generate($mock);

# constrain the characters selected
my $any= Mock::Data::Regex->new(qr/.+/);
say $any->generate($mock, { min_codepoint => 0x20, max_codepoint => 0xFFFF });

# surround generated regex-match with un-matched prefix/suffix
say $email->generate($mock, { prefix => q{<a href="mailto:}, suffix => q{">Contact</a>} });

DESCRIPTION

This generator creates strings that match a user-supplied regular expression.

CONSTRUCTOR

new

my $gen= Mock::Data::Regex->new( $regex_ref );
                       ...->new( \%options );
                       ...->new( %options );

The constructor can take a key/value list of attributes, hash of attributes, or a single argument which is assumed to be a regular expression.

Any attribute may be supplied in %options. The regular expression must be provided, and it is parsed immediately to check whether it is supported by this module. (this module lacks support for several regex features, such as lookaround assertions and backreferences)

ATTRIBUTES

regex

The regular expression this generator is matching. This will always be a regex-ref, even if you gave a string to the constructor.

regex_parse_tree

A data structure describing the regular expression. WARNING: The API of this data structure may change in future versions.

min_codepoint

The minimum codepoint to be considered when processing the regular expression or generating strings from it. You might choose to set this to i.e. 0x20 to avoid generating control characters. This only affects selection from character sets; literal control characters in the pattern will still be returned.

max_codepoint

The maximum codepoint to be considered when processing the regular expression or generating strings from it. Setting this to a low value (like 127 for ASCII) can speed up the algorithm in many cases. This is set to 127 automatically if the "regex" has the /a flag.

max_repetition

max_repetition => '+8',
max_repetition => 10,

Whenever a regex has an un-bounded repetition, this determines the upper bound on the random number of repetitions. Set this to a plain number to specify an absolute maximum, or string with leading plus sign ("+$n") to specify a maximum relative to the minimum. The default is "+8".

prefix

->new(regex => qr/foo/,   prefix => '_')->generate # returns "_foo"
->new(regex => qr/^foo/,  prefix => '_')->generate # returns "foo"
->new(regex => qr/^foo/m, prefix => '_')->generate # returns "_\nfoo"

A generator or template to add to the beginning of the output whenever the regex is not anchored at the start or is multi-line. It will be joined to the output with a "\n" if the regex is multi-line and anchored from '^'.

suffix

->new(regex => qr/foo/,   suffix => '_')->generate # returns "foo_"
->new(regex => qr/foo$/,  suffix => '_')->generate # returns "foo"
->new(regex => qr/foo$/m, suffix => '_')->generate # returns "foo\n_"

A generator or template to add to the end of the output whenever the regex is not anchored at the end.

METHODS

generate

my $str= $generator->generate($mockdata, \%options);

Return a string matching the regular expression. The %options may override the following attributes: "min_codepoint", "max_codepoint", "max_repetitions", "prefix", "suffix".

compile

Return a generator coderef that calls "generate" on this object.

parse

Parse a regular expression, returning a parse tree describing it. This can be called as a class method.

get_charset

If the regular expression is nothing more than a charset (or repetition of one charset) this returns that charset. If the regular expression is more complicated than a simple charset, this returns undef.

SEE ALSO

String::Random::Regexp::regxstring

Probably a better implementation, but depends on a C++ compiler.

String::Random
Regexp::Genex

AUTHOR

Michael Conrad <mike@nrdvana.net>

VERSION

version 0.04

COPYRIGHT AND LICENSE

This software is copyright (c) 2024 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.