NAME
Data::Tubes::Plugin::Parser
DESCRIPTION
This module contains factory functions to generate tubes that ease parsing of input records.
Each of the generated tubes has the following contract:
the input record MUST be a hash reference;
one field in the hash (according to factory argument
input
, set toraw
by default) points to the input text that has to be parsed;one field in the hash (according to factory argument
output
, set tostructured
by default) is set to the output of the parsing operation.
The factory functions below have two names, one starting with parse_
and the other without this prefix. They are perfectly equivalent to each other, whereas the short version can be handier e.g. when using tube
or pipeline
from Data::Tubes.
FUNCTIONS
- by_format
- parse_by_format
-
my $tube = by_format(%args); # OR my $tube = by_format(\%args); # OR
parse the input text according to a template format string (passed via factory argument
format
). This string is supposed to be composed of word and non-word sequences, where each word sequence is assumed to be the name of a field, and each non-word sequence is a separator. Example:$format = 'foo;bar;baz';
is interpreted as follows:
@field_names = ('foo', 'bar', 'baz'); @separators = (';', ';');
Example:
$format = 'foo;bar~~~baz';
is interpreted as follows:
@field_names = ('foo', 'bar', 'baz'); @separators = (';', '~~~');
In the first case, i.e. when all separators are equal to each other, "parse_by_split" will be called, as it is (arguably) slightly more efficient. Otherwise, "parse_by_regexes" will be called. Whatever these two factories return will be returned back.
All
@field_names
MUST be different from one another.The following arguments are supported:
format
-
the format to use for splitting the inputs;
input
-
name of the input field, defaults to
raw
; name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
;
- by_regex
- parse_by_regex
-
my $tube = by_regex(%args); # OR my $tube = by_regex(\%args); # OR
parse the input text based on a regular expression, passed as argument
regex
. The regular expression is supposed to have named captures, that will eventually be used to populate the rendered output.The following arguments are supported:
input
-
name of the input field, defaults to
raw
; name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
; regex
-
the regular expression to use for splitting the inputs.
- by_separators
- parse_by_separators
-
my $tube = by_separators(%args); # OR my $tube = by_separators(\%args); # OR
parse the input according to a series of separators, that will be applied in sequence. For example, if the list of separators is the following:
@separators = (';', '~~');
the following input:
$text = 'foo;bar~~/baz/';
will be split as:
@split = ('foo', 'bar', '/baz/');
The following arguments are supported:
input
-
name of the input field, defaults to
raw
; keys
-
a reference to an array containing the list of keys to be associated to the values from the split;
name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
; separators
-
a reference to an array containing the list of separators to be used for splitting the input.
- by_split
- parse_by_split
-
my $tube = by_split(%args); # OR my $tube = by_split(\%args); # OR
split the input according to a separator string.
The following arguments are supported:
input
-
name of the input field, defaults to
raw
; keys
-
optional reference to an array containing a list of keys to be associated to the split data. If present, it will be used as such; if absent, a reference to an array will be set as output.
name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
; separator
-
a reference to an array containing the list of separators to be used for splitting the input.
- hashy
- parse_hashy
-
my $tube = hashy(%args); # OR my $tube = hashy(\%args);
parse the input text as a hash. The algorithm used is the same as
metadata
in Data::Tubes::Util.chunks_separator
-
character used to divide chunks in the input;
default_key
-
the default key to be used when a key is not present in a chunk;
input
-
name of the input field, defaults to
raw
; key_value_separator
-
character used to divide the key from the value in a chunk;
name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
;
- single
- parse_single
-
my $tube = single(%args); # OR my $tube = single(\%args);
consider the input text as already parsed, and generate as output a hash reference where the text is associated to a key.
input
-
name of the input field, defaults to
raw
; key
-
key to use for associating the input text;
name
-
name of the tube, useful for debugging;
output
-
name of the output field, defaults to
structured
;
BUGS AND LIMITATIONS
Report bugs either through RT or GitHub (patches welcome).
AUTHOR
Flavio Poletti <polettix@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2016 by Flavio Poletti <polettix@cpan.org>
This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.