NAME
SPVM::Regex - Regular expression
SYNOPSYS
use Regex;
# Pattern match
{
my $re = Regex->new("ab*c");
my $target = "zabcz";
my $match = $re->match($target, 0);
}
# Pattern match - UTF-8
{
my $re = Regex->new("あ+");
my $target = "いあああい";
my $match = $re->match($target, 0);
}
# Pattern match - Character class and the nagation
{
my $re = Regex->new("[A-Z]+[^A-Z]+");
my $target = "ABCzab";
my $match = $re->match($target, 0);
}
# Pattern match with captures
{
my $re = Regex->new("^(\w+) (\w+) (\w+)$");
my $target = "abc1 abc2 abc3";
my $match = $re->match($target, 0);
if ($match) {
my $cap1 = $re->captures->[0];
my $cap2 = $re->captures->[1];
my $cpa3 = $re->captures->[2];
}
}
# Replace
{
my $re = Regex->new("abc");
my $target = "ppzabcz";
# "ppzABCz"
my $result = $re->replace($target, 0, "ABC");
my $replace_count = $re->replace_count;
}
# Replace with a callback and capture
{
my $re = Regex->new("a(bc)");
my $target = "ppzabcz";
# "ppzABbcCz"
my $result = $re->replace_cb($target, 0, method : string ($re : Regex) {
return "AB" . $re->captures->[0] . "C";
});
}
# Replace all
{
my $re = Regex->new("abc");
my $target = "ppzabczabcz";
# "ppzABCzABCz"
my $result = $re->replace_all($target, 0, "ABC");
}
# Replace all with a callback and capture
{
my $re = Regex->new("a(bc)");
my $target = "ppzabczabcz";
# "ppzABCbcPQRSzABCbcPQRSz"
my $result = $re->replace_all_cb($target, 0, method : string ($re : Regex) {
return "ABC" . $re->captures->[0] . "PQRS";
});
}
# . - single line mode
{
my $re = Regex->new("(.+)", "s");
my $target = "abc\ndef";
my $match = $re->match($target, 0);
unless ($match) {
return 0;
}
unless ($re->captures->[0] eq "abc\ndef") {
return 0;
}
}
DESCRIPTION
Regex provides regular expression functions.
This module is very unstable compared to other modules. So many changes will be performed.
REGULAR EXPRESSION SYNTAX
Regex provides the methodset of Perl regular expression. The target string and regex string is interpretted as UTF-8 string.
# Quantifier
+ more than or equals to 1 repeats
* more than or equals to 0 repeats
? 0 or 1 repeats
{m,n} repeats between m and n
# Regular expression character
^ first of string
$ last of string
. all character except "\n"
# Default mode ASCII mode
\d Not supported [0-9]
\D Not supported not \d
\s Not supported " ", "\t", "\f", "\r", "\n"
\S Not supported not \s
\w Not supported [a-zA-Z0-9_]
\W Not supported not \w
# Character class and the negatiton
[a-z0-9]
[^a-z0-9]
# Capture
(foo)
Regex Options:
s single line mode
a ascii mode
Regex options is used by new_with_options
method.
my $re = Regex->new("^ab+c", "sa");
Limitations:
Regex do not support the same set of characters after a quantifier.
# A exception occurs
Regex->new("a*a");
Regex->new("a?a");
Regex->new("a+a");
Regex->new("a{1,3}a")
If 0 width quantifir is between two same set of characters after a quantifier, it is invalid.
# A exception occurs
Regex->new("\d+\D*\d+");
Regex->new("\d+\D?\d+");
CLASS METHODS
new
static method new : Regex ($re_str_and_options : string[]...)
Create a new Regex object and compile the regex.
my $re = Regex->new("^ab+c");
my $re = Regex->new("^ab+c", "s");
new_with_options
static method new_with_options : Regex ($re_str : string, $option_chars : string) {
Create a new Regex object and compile the regex with the options.
my $re = Regex->new("^ab+c", "s");
INSTANCE METHODS
captures
static method captures : string[] ()
Get the strings captured by "match" method.
match_start
static method match_start : int ()
Get the start byte offset of the string matched by "match" method method.
match_length
static method match_length : int ()
Get the byte length of the string matched by "match" method method.
replace_count
static method replace_count : int ();
Get the replace count of the strings replaced by "replace" or "replace_all" method.
match
method match : int ($target : string, $target_offset : int)
Execute pattern matching to the specific string and the start byte offset of the string.
If the pattern match succeeds, 1 is returned, otherwise 0 is returned.
You can get captured strings using "captures" method, and get the byte offset of the matched whole string using "match_start" method, and get the length of the matched whole string using "match_length" method.
replace
method replace : string ($target : string, $target_offset : int, $replace : string)
Replace the target string specified with the start byte offset with replace string.
replace_cb
method replace_cb : string ($target : string, $target_offset : int, $replace_cb : Regex::Replacer)
Replace the target string specified with the start byte offset with replace callback. The callback must have the Regex::Replacer interface..
replace_all
method replace_all : string ($target : string, $target_offset : int, $replace : string)
Replace all of the target strings specified with the start byte offset with replace string.
replace_all_cb
method replace_all_cb : string ($target : string, $target_offset : int, $replace_cb : Regex::Replacer)
Replace all of the target strings specified with the start byte offset with replace callback. The callback must have the Regex::Replacer interface.
cap1
method cap1 : string ()
The alias for $re-
captures->[0]>.
cap2
method cap2 : string ()
The alias for $re-
captures->[1]>.
cap3
method cap3 : string ()
The alias for $re-
captures->[2]>.
cap4
method cap4 : string ()
The alias for $re-
captures->[3]>.
cap5
method cap5 : string ()
The alias for $re-
captures->[4]>.
cap6
method cap6 : string ()
The alias for $re-
captures->[5]>.
cap7
method cap7 : string ()
The alias for $re-
captures->[6]>.
cap8
method cap8 : string ()
The alias for $re-
captures->[7]>.
cap9
method cap9 : string ()
The alias for $re-
captures->[8]>.
cap10
method cap10 : string ()
The alias for $re-
captures->[9]>.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 22:
Non-ASCII character seen before =encoding in 'Regex->new("あ+");'. Assuming UTF-8