Sq::Manual::Parser::Intro

In this Intro we use Sq::Parser to parse an integer. First i create an exhausted example so you can see how to use the Parser, this doesn't mean the example we build here is the best version. A better approach is shown at the end.

When you load Sq::Parser it imports a series of functions all prefixed with p_. There is only a function based interface.

Parsing an int

The first most primitive function we have for parsing is p_strc. It creates a parser that checks for a string. You can write.

my $zero = p_strc('0');

$zero now represent a Parser that only can parse the character 0 and that's it. You can execute a parser by running it with p_run.

my $opt = p_run($zero, "0");

In this case $opt will be Some( [0] ). When the Parsing is successful it returns Some($array) otherwise when parsing fails you get a None.

is(p_run($zero,   "0"), Some([0]), 'parses 0');
is(p_run($zero,   "1"),      None, 'do not parse 1');
is(p_run($zero, "012"), Some([0]), 'parses 0');

Consider the last example. Also here it is succesfull. Because the string "012" starts with 0. At this point all of it is much the same as the following regex approach.

if ( $str =~ m/\A0/ ) {
    ...
}

Or: Alternatives

Parsing just a predefined string will not help for parsing an integer. First we need a way to say that different digits are allowed. There is a function p_or that expects multiple parsers and is successfull as long one of the parser is valid.

my $digit = p_or(
    p_strc('0'),
    p_strc('1'),
    p_strc('2'),
    p_strc('3'),
    p_strc('4'),
    p_strc('5'),
    p_strc('6'),
    p_strc('7'),
    p_strc('8'),
    p_strc('9'),
);

is(p_run($digit, "012"), Some([0]), 'parses 0');
is(p_run($digit, "123"), Some([1]), 'parses 1');
is(p_run($digit, "666"), Some([6]), 'parses 6');
is(p_run($digit, "a12"),      None, 'no digit');

Sure, we also can write typical Perl code.

my $digit = p_or(map { p_strc($_) } 0 .. 9);

is(p_run($digit, "012"), Some([0]), 'parses 0');
is(p_run($digit, "123"), Some([1]), 'parses 1');
is(p_run($digit, "666"), Some([6]), 'parses 6');
is(p_run($digit, "a12"),      None, 'no digit');

And: Chaining

The next very important function will be p_and. It expects multiple parsers and all of them are run one after another and all of them must succeed. For example we can parse three digits this way.

my $digit  = p_or(map { p_strc($_) } 0 .. 9);
my $digit3 = p_and($digit, $digit, $digit);

is(p_run($digit3, "012"), Some([0,1,2]), 'parses 012');
is(p_run($digit3, "123"), Some([1,2,3]), 'parses 123');
is(p_run($digit3, "666"), Some([6,6,6]), 'parses 666');
is(p_run($digit3, "a12"),          None, 'no digit');

The important idea is that we are not just parsing/checking, but also extracting what we succesfully parse. We get a result representing the single matched characters.

map

At certain places we want to work with the extracted data so far. For example instead of three single digits we want to join the results back into a single string again. In general we can use p_map to transform the values into any other value.

my $digit3 = assign {
    my $digit = p_or(map { p_strc($_) } 0 .. 9);
    my $three = p_and($digit, $digit, $digit);
    return p_map(sub(@xs) { join '', @xs }, $three);
};

is(p_run($digit3, "012"), Some(["012"]), 'parses 012');
is(p_run($digit3, "123"), Some(["123"]), 'parses 123');
is(p_run($digit3, "666"), Some(["666"]), 'parses 666');
is(p_run($digit3, "a12"),          None, 'no digit');

Joining

But joining a string is such a common operation, so there is also a p_join function.

my $digit3 = assign {
    my $digit = p_or(map { p_strc($_) } 0 .. 9);
    return p_join('', p_and($digit, $digit, $digit));
};

is(p_run($digit3, "012"), Some(["012"]), 'parses 012');
is(p_run($digit3, "123"), Some(["123"]), 'parses 123');
is(p_run($digit3, "666"), Some(["666"]), 'parses 666');
is(p_run($digit3, "a12"),          None, 'no digit');

Quantity

Up so far we parse for exactly three digits. We also need to pass $digit three times to p_and. How about defining a minimum and maximum range instead? We can do that using p_qty.

my $digit10 = assign {
    my $digit = p_or(map { p_strc($_) } 0 .. 9);
    return p_join('', p_qty(1, 10, $digit));
};

is(p_run($digit10,       "0"),      Some(["0"]), 'parses 0');
is(p_run($digit10,      "12"),     Some(["12"]), 'parses 12');
is(p_run($digit10,  "666666"), Some(["666666"]), 'parses 666666');
is(p_run($digit10, "666f666"),    Some(["666"]), 'parses 666');
is(p_run($digit10,     "a12"),             None, 'no digit');

This is the same as \d{1,10} in a regex. It expects 1 upto 10 digits.

+: One or Many

In regex we often use + for meaning at least one, and up so many that is possible. The function p_many does the same.

my $digits = assign {
    my $digit = p_or(map { p_strc($_) } 0 .. 9);
    return p_join('', p_many($digit));
};

is(p_run($digits,       "0"),      Some(["0"]), 'parses 0');
is(p_run($digits,      "12"),     Some(["12"]), 'parses 12');
is(p_run($digits,  "666666"), Some(["666666"]), 'parses 666666');
is(p_run($digits, "666f666"),    Some(["666"]), 'parses 666');
is(p_run($digits,     "a12"),             None, 'no digit');

?: maybe

How about negative integers? We want the ability that an integer can be prefixed wih either '+' or '-'. And additionally this sign must not be present.

my $digits = assign {
    my $digit       = p_or(map { p_strc($_) } 0 .. 9);
    my $sign        = p_maybe(p_or(p_strc('+'), p_strc('-')));
    my $sign_digits = p_and($sign, p_many($digit));
    return p_join('', $sign_digits);
};

is(p_run($digits,       "0"),      Some(["0"]), 'parses 0');
is(p_run($digits,      "12"),     Some(["12"]), 'parses 12');
is(p_run($digits,     "+13"),    Some(["+13"]), 'parses +13');
is(p_run($digits,  "666666"), Some(["666666"]), 'parses 666666');
is(p_run($digits, "666f666"),    Some(["666"]), 'parses 666');
is(p_run($digits, "-666f666"),  Some(["-666"]), 'parses -666');
is(p_run($digits,     "a12"),             None, 'no digit');

No variables

Assigning variables is not necessary, we also can inline most stuff, I would prefer to write it this way.

my $digits =
    p_join('',
        p_and(
            p_maybe(p_or(p_strc('+'), p_strc('-'))),  # sign
            p_many (p_or(map { p_strc($_) } 0 .. 9)), # many digits
        )
    );

is(p_run($digits,        "0"),      Some(["0"]), 'parses 0');
is(p_run($digits,       "12"),     Some(["12"]), 'parses 12');
is(p_run($digits,      "+13"),    Some(["+13"]), 'parses +13');
is(p_run($digits,   "666666"), Some(["666666"]), 'parses 666666');
is(p_run($digits,  "666f666"),    Some(["666"]), 'parses 666');
is(p_run($digits, "-666f666"),  Some(["-666"]),  'parses -666');
is(p_run($digits,      "a12"),             None, 'no digit');

*: Zero or more

Between the sign and the starting of the digit we want to allow zero or more spaces. The function p_many0 does that.

my $int =
    p_join('',
        p_and(
            p_maybe(p_or(p_strc('+'), p_strc('-'))),  # sign
            p_many0(p_strc(' ')),                     # zero or more ws
            p_many (p_or(map { p_strc($_) } 0 .. 9)), # many digits
        )
    );

is(p_run($int,        "0"),     Some(["0"]), 'parses 0');
is(p_run($int,       "12"),    Some(["12"]), 'parses 12');
is(p_run($int,      "+13"),   Some(["+13"]), 'parses +13');
is(p_run($int,    "- 666"), Some(["- 666"]), 'parses - 666');
is(p_run($int,    "+ 666"), Some(["+ 666"]), 'parses + 666');
is(p_run($int,  "666f666"),   Some(["666"]), 'parses 666');
is(p_run($int, "-666f666"),  Some(["-666"]), 'parses -666');
is(p_run($int,      "a12"),            None, 'no digit');

No capture

By default p_strc captures everything it matches. But in the above example we don't want the whitespace to appear in the output. When we are not interested in the capture we use p_str instead.

my $int =
    p_join('',
        p_and(
            p_maybe(p_or(p_strc('+'), p_strc('-'))),  # sign
            p_many0(p_str(' ')),                      # zero or more ws
            p_many (p_or(map { p_strc($_) } 0 .. 9)), # many digits
        )
    );

is(p_run($int,        "0"),    Some(["0"]), 'parses 0');
is(p_run($int,       "12"),   Some(["12"]), 'parses 12');
is(p_run($int,      "+13"),  Some(["+13"]), 'parses +13');
is(p_run($int,    "- 666"), Some(["-666"]), 'parses -666');
is(p_run($int,    "+ 666"), Some(["+666"]), 'parses +666');
is(p_run($int,  "666f666"),  Some(["666"]), 'parses 666');
is(p_run($int, "-666f666"), Some(["-666"]), 'parses -666');
is(p_run($int,      "a12"),           None, 'no digit');

Using regex

As you can see we have a regex like function based parser. But up so far we only used p_strc and p_str for parsing that only relies on matching characters. The goal is not to replace Perls regex. Perl's regex are powerful and fast. This parser also fully supports creating single parser out of regexes by p_match. This is maybe the function you should use most of the time.

So all we have written can be replaced like this.

my $int = p_match(qr/([+-]? \s* \d+)/x);

is(p_run($int,        "0"),     Some(["0"]), 'parses 0');
is(p_run($int,       "12"),    Some(["12"]), 'parses 12');
is(p_run($int,      "+13"),   Some(["+13"]), 'parses +13');
is(p_run($int,    "- 666"), Some(["- 666"]), 'parses -666');
is(p_run($int,    "+ 666"), Some(["+ 666"]), 'parses +666');
is(p_run($int,  "666f666"),   Some(["666"]), 'parses 666');
is(p_run($int, "-666f666"),  Some(["-666"]), 'parses -666');
is(p_run($int,      "a12"),            None, 'no digit');

p_match automatically extracts all captures. As we only have a single parenthesis we also only get a single match. We also could create two matches.

my $int = p_match(qr/([+-])? \s* (\d+)/x);

is(p_run($int,        "0"),    Some([undef, "0"]), 'parses 0');
is(p_run($int,       "12"),   Some([undef, "12"]), 'parses 12');
is(p_run($int,      "+13"),     Some(["+", "13"]), 'parses +13');
is(p_run($int,    "- 666"),    Some(["-", "666"]), 'parses -666');
is(p_run($int,    "+ 666"),    Some(["+", "666"]), 'parses +666');
is(p_run($int,  "666f666"),  Some([undef, "666"]), 'parses 666');
is(p_run($int, "-666f666"),    Some(["-", "666"]), 'parses -666');
is(p_run($int,      "a12"),                  None, 'no digit');

match and transform

We also could p_map the result of p_match again and transform the two values into an integer. But p_matchf does it in a single call. The function we pass to p_matchf only executes when the match was succesfull. The result of that function is then used as the parsing result.

my $int = p_matchf(qr/([+-])? \s* (\d+)/x, sub($sign,$num) {
    if ( defined $sign ) {
        return $sign eq '-' ? $num * -1 : $num;
    }
    return $num;
});

is(p_run($int,        "0"),    Some([0]), 'parses 0');
is(p_run($int,       "12"),   Some([12]), 'parses 12');
is(p_run($int,      "+13"),   Some([13]), 'parses 13');
is(p_run($int,    "- 666"), Some([-666]), 'parses -666');
is(p_run($int,    "+ 666"),  Some([666]), 'parses 666');
is(p_run($int,  "666f666"),  Some([666]), 'parses 666');
is(p_run($int, "-666f666"), Some([-666]), 'parses -666');
is(p_run($int,      "a12"),         None, 'no digit');

All the captures of the regex are passed as function arguments to p_matchf.

match and filter

An important feature is that we not only can return different things, basically any data-structure, object and so on you can think of. But we also can filter or return multiple arguments. But when nothing is returned, then parsing is considered as a failure.

Let's say we not only want to parse any integer, but than we want to restrict the number to be between 0 and 100 we could do.

my $hundred = p_matchf(qr/([+-])? \s* (\d+)/x, sub($sign,$num) {
    my $result = $num;
    if ( defined $sign && $sign eq '-' ) {
        $result = $result * -1;
    }
    if ( $result >= 0 && $result <= 100 ) {
        return $result;
    }
    return;
});

is(p_run($hundred,        "0"),  Some([0]), 'parses 0');
is(p_run($hundred,       "12"), Some([12]), 'parses 12');
is(p_run($hundred,      "+13"), Some([13]), 'parses 13');
is(p_run($hundred,    "- 666"),       None, 'not valid');
is(p_run($hundred,    "+ 666"),       None, 'not valid');
is(p_run($hundred,  "666f666"),       None, 'not valid');
is(p_run($hundred, "-666f666"),       None, 'not valid');
is(p_run($hundred,      "a12"),       None, 'not valid');

integer list

Consider that p_matchf also just returns a parser. And this parser can be used with all the functions you have seen so far. For example consider a string that either is just a single integer or contains multiple integers separated by a colon.

my $int = p_matchf(qr/([+-])? \s* (\d+)/x, sub($sign,$num) {
    my $result = $num;
    $result *= -1 if defined $sign && $sign eq '-';
    return $result;
});

my $int_list =
    p_and(
        $int,
        p_many0(p_and(p_match(qr/\s* , \s*/x), $int)),
    );

is(p_run($int_list,       "1"),           Some([1]), '1 int');
is(p_run($int_list,     "1,2"),         Some([1,2]), '2 int');
is(p_run($int_list,  "12,-23"),      Some([12,-23]), '2 int');
is(p_run($int_list, "12,- 23, 0"), Some([12,-23,0]), '3 int');

Consider the p_match(qr/\s* , \s*/x) call. Here no capture is defined in the regex. This means that the regex still must match to be succesfull but it doesn't capture anything.

Conclusion

I hope you get an understanding of the working and how you can use regular expressions as single pieces to put a whole parser with transformation together.

There are still some more functions to cover but those shown here are the basics you can already use to built some complicated stuff.