Parsing Numbers with suffixes
Here is a full example for parsing numbers with suffixes.
my $num = assign {
my $to_num = sub($num,$suffix) {
return $num if $suffix eq 'b';
return $num * 1024 if $suffix eq 'kb';
return $num * 1024 * 1024 if $suffix eq 'mb';
return $num * 1024 * 1024 * 1024 if $suffix eq 'gb';
};
p_many(
p_maybe(p_match(qr/\s* , \s*/x)), # optional ,
p_map(
$to_num,
p_many (p_strc(0 .. 9)), # digits
p_match(qr/\s*/), # whitespace
p_strc (qw/b kb mb gb/), # suffix
)
);
};
is(p_run($num, "1 b, 1kb"), Some([1, 1024]), '1 b & 1kb');
is(p_run($num, "1 kb, 1gb"), Some([1024,1073741824]), '1 kb');
is(p_run($num, "1 mb"), Some([1048576]), '1 mb');
is(p_run($num, "1 gb"), Some([1073741824]), '1 gb');
assign
here is used to create a new scope. The last statement is returned and assigned to the variable. The reason for this is that this way the function $to_num
is scoped and isn't accessible by other functions.
We could inline the function directly in the p_map
call, but extracting it makes it sometimes better readable. But we don't want $to_num
exposed to other code, because it is directly tied to the parsing construct.
Additionally you also can see that p_map
also can be passed multiple parsers. Multiple parsers are again assembled together with p_and
So writing.
p_map($func, $parser1, $parser2, $parser3);
is the same as
p_map($func, p_and($parser1, $parser2, $parser3));
p_many
here is passed two parsers. The first one expects a colon and consumes any whitespace before and after it. But it's optional. Then after that colon a number with suffix must be passed.
It's wrapped in a p_map
because every single number that is extracted is converted to its full representation. For example 1 mb turns into 1048576.
Also consider that this way also ",1b,1kb" is valid. We can change the parsing construct that the leading colon is not allowed. It's up to you to decide how forgiving you want to be for input to be valid or not.
Use Regexes
The above is just an example on how you can use the different functions to combine stuff. But Regexes in itself already can solve a lot without relying too much on the Parser API. For example a better approach for parsing the above would be.
my $num = assign {
my $to_num = sub($num,$suffix) {
return $num if $suffix eq 'b';
return $num * 1024 if $suffix eq 'kb';
return $num * 1024 * 1024 if $suffix eq 'mb';
return $num * 1024 * 1024 * 1024 if $suffix eq 'gb';
};
p_many(
p_matchf(qr/\s* ,? \s* (\d+) \s* (b|kb|mb|gb)/xi, $to_num),
);
};
is(p_run($num, "1 b, 1kb"), Some([1, 1024]), '1 b & 1kb');
is(p_run($num, "1 kb, 1gb"), Some([1024,1073741824]), '1 kb & 1gb');
is(p_run($num, "1 mb"), Some([1048576]), '1 mb');
is(p_run($num, "1 gb"), Some([1073741824]), '1 gb');
Here you can see that the function $to_num
stays the same, but the whole parsing is basically replaces by a single regex. Only p_many
is used to repeat the regex one or multiple times.
Theoretically this also is easy in a Regex. Just put (?: ... )+
around whatever you wanna repeat one or more times. But you wouldn't get the conversion and extractin of the number so easy. But even that would be possible in Perl Regexes.
This regex version also has one advantage. The modifier i
at the end of the regex also allows for "MB", "KB" or "Kb" as it ignores upper and lower case.
It also shows another important aspect you will encounter in Sq
. $to_num
is actually re-usable. You can pass it to p_map
or p_matchf
and it works. Actually you also can pass this function to a Option::map
call that has two values stored in it. For example this also would work.
my $num = Some('1', 'kb')->map($to_num);