NAME
ETL::Pipeline::Input::File::List - Role for input sources with multiple files
SYNOPSIS
# In the input source...
...
sub
run {
my
(
$self
,
$etl
) =
@_
;
...
while
(
my
$path
=
$self
->next_path(
$etl
)) {
...
}
}
DESCRIPTION
This is a role used by input sources. It defines everything you need to process multiple input files of the same format. The role uses Path::Class::Rule to locate matching files.
Your input source calls the "next_path" method in a loop. That's it. The role automatically processes constructor arguments that match Path::Class::Rule criteria. It then builds a list of matching files the first time your code calls "next_path".
METHODS & ATTRIBUTES
Arguments for "input" in ETL::Pipeline
ETL::Pipeline::Input::File::List accepts any of the tests provided by Path::Iterator::Rule. The value of the argument is passed directly into the test. For boolean tests (e.g. readable, exists, etc.), pass an undef
value.
ETL::Pipeline::Input::File automatically applies the file
filter. Do not pass file
through "input" in ETL::Pipeline.
iname
is the most common one that I use. It matches the file name, supports wildcards and regular expressions, and is case insensitive.
# Search using a regular expression...
$etl
->input(
'XmlFiles'
,
iname
=>
qr/\.xml$/
);
# Search using a file glob...
$etl
->input(
'XmlFiles'
,
iname
=>
'*.xml'
);
path
Path::Class::File object for the currently selected file. This is first file that matches the criteria. When you call "next_path", it finds the next match and sets path.
So path always points to the current file. It should be used by your input source class as the file name.
# Inside the input source class...
while
(
$self
->next_path(
$etl
)) {
open
my
$io
,
'<'
,
$self
->path;
...
}
undef
means no more matches.
Methods
next_path
Looks for the next match in the list and sets the "path" attribute. It also returns the matching path. Your input source class should setup a loop calling this method. Inside the loop, process each file.
next_path takes one parameter - the ETL::Pipeline object. The method matches files in "data_in" in ETL::Pipeline.
SEE ALSO
ETL::Pipeline, ETL::Pipeline::Input, Path::Class::File, Path::Class::Rule, Path::Iterator::Rule
AUTHOR
Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
LICENSE
Copyright 2021 (c) Vanderbilt University Medical Center
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.