Parsing Numbers with suffixes

Here is a full example for parsing numbers with suffixes.

my $num = assign {
    my $to_num = sub($num,$suffix) {
        return $num                      if $suffix eq 'b';
        return $num * 1024               if $suffix eq 'kb';
        return $num * 1024 * 1024        if $suffix eq 'mb';
        return $num * 1024 * 1024 * 1024 if $suffix eq 'gb';
    };

    p_many(
        p_maybe(p_match(qr/\s* , \s*/x)), # optional ,
        p_map(
            $to_num,
            p_many (p_strc(0 .. 9)), # digits
            p_match(qr/\s*/),        # whitespace
            p_strc (qw/b kb mb gb/), # suffix
        )
    );
};

is(p_run($num, "1  b, 1kb"),         Some([1, 1024]), '1 b & 1kb');
is(p_run($num, "1 kb, 1gb"), Some([1024,1073741824]), '1 kb');
is(p_run($num, "1 mb"),              Some([1048576]), '1 mb');
is(p_run($num, "1 gb"),           Some([1073741824]), '1 gb');

assign here is used to create a new scope. The last statement is returned and assigned to the variable. The reason for this is that this way the function $to_num is scoped and isn't accessible by other functions.

We could inline the function directly in the p_map call, but extracting it makes it sometimes better readable. But we don't want $to_num exposed to other code, because it is directly tied to the parsing construct.

Additionally you also can see that p_map also can be passed multiple parsers. Multiple parsers are again assembled together with p_and

So writing.

p_map($func, $parser1, $parser2, $parser3);

is the same as

p_map($func, p_and($parser1, $parser2, $parser3));

p_many here is passed two parsers. The first one expects a colon and consumes any whitespace before and after it. But it's optional. Then after that colon a number with suffix must be passed.

It's wrapped in a p_map because every single number that is extracted is converted to its full representation. For example 1 mb turns into 1048576.

Also consider that this way also ",1b,1kb" is valid. We can change the parsing construct that the leading colon is not allowed. It's up to you to decide how forgiving you want to be for input to be valid or not.

Use Regexes

The above is just an example on how you can use the different functions to combine stuff. But Regexes in itself already can solve a lot without relying too much on the Parser API. For example a better approach for parsing the above would be.

my $num = assign {
    my $to_num = sub($num,$suffix) {
        return $num                      if $suffix eq 'b';
        return $num * 1024               if $suffix eq 'kb';
        return $num * 1024 * 1024        if $suffix eq 'mb';
        return $num * 1024 * 1024 * 1024 if $suffix eq 'gb';
    };

    p_many(
        p_matchf(qr/\s* ,? \s* (\d+) \s* (b|kb|mb|gb)/xi, $to_num),
    );
};

is(p_run($num, "1  b, 1kb"),         Some([1, 1024]), '1 b & 1kb');
is(p_run($num, "1 kb, 1gb"), Some([1024,1073741824]), '1 kb & 1gb');
is(p_run($num, "1 mb"),              Some([1048576]), '1 mb');
is(p_run($num, "1 gb"),           Some([1073741824]), '1 gb');

Here you can see that the function $to_num stays the same, but the whole parsing is basically replaces by a single regex. Only p_many is used to repeat the regex one or multiple times.

Theoretically this also is easy in a Regex. Just put (?: ... )+ around whatever you wanna repeat one or more times. But you wouldn't get the conversion and extractin of the number so easy. But even that would be possible in Perl Regexes.

This regex version also has one advantage. The modifier i at the end of the regex also allows for "MB", "KB" or "Kb" as it ignores upper and lower case.

It also shows another important aspect you will encounter in Sq. $to_num is actually re-usable. You can pass it to p_map or p_matchf and it works. Actually you also can pass this function to a Option::map call that has two values stored in it. For example this also would work.

my $num = Some('1', 'kb')->map($to_num);