Sponsoring The Perl Toolchain Summit 2025: Help make this important event another success Learn more

NAME
App::CSVUtils - CLI utilities related to CSV
VERSION
This document describes version 1.036 of App::CSVUtils (from Perl
distribution App-CSVUtils), released on 2025-02-04.
DESCRIPTION
This distribution contains the following CLI utilities:
1. csv-add-fields
2. csv-avg
3. csv-check-cell-values
4. csv-check-field-names
5. csv-check-field-values
6. csv-check-rows
7. csv-cmp
8. csv-concat
9. csv-convert-to-hash
10. csv-csv
11. csv-delete-fields
12. csv-dump
13. csv-each-row
14. csv-fill-cells
15. csv-fill-template
16. csv-find-values
17. csv-freqtable
18. csv-gen
19. csv-get-cells
20. csv-grep
21. csv-info
22. csv-intrange
23. csv-list-field-names
24. csv-lookup-fields
25. csv-ltrim
26. csv-map
27. csv-munge-field
28. csv-munge-rows
29. csv-pick
30. csv-pick-cells
31. csv-pick-fields
32. csv-pick-rows
33. csv-quote
34. csv-replace-newline
35. csv-rtrim
36. csv-select-fields
37. csv-select-rows
38. csv-setop
39. csv-shuf
40. csv-shuf-fields
41. csv-shuf-rows
42. csv-sort
43. csv-sort-fields
44. csv-sort-fields-by-example
45. csv-sort-fields-by-spec
46. csv-sort-rows
47. csv-sorted
48. csv-sorted-fields
49. csv-sorted-rows
50. csv-split
51. csv-sum
52. csv-transpose
53. csv-trim
54. csv-uniq
55. csv-unquote
56. csv2csv
57. csv2ltsv
58. csv2paras
59. csv2td
60. csv2tsv
61. csv2vcf
62. list-csvutils
63. paras2csv
64. tsv2csv
FUNCTIONS
gen_csv_util
Usage:
gen_csv_util(%args) -> bool
Generate a CSV utility.
This routine is used to generate a CSV utility in the form of a Rinci
function (code and metadata). You can then produce a CLI from the Rinci
function simply using Perinci::CmdLine::Gen or, if you use Dist::Zilla,
Dist::Zilla::Plugin::GenPericmdScript or, if on the command-line,
gen-pericmd-script.
Using this routine, by providing just one or a few hooks and setting
some parameters like a couple of extra arguments, you will get a
complete CLI with decent POD/manpage, ability to read one or multiple
CSV's and write one or multiple CSV's, some command-line options to
customize how the input CSV's should be parsed and how the output CSV's
should be formatted and named. Your CLI also has tab completion, usage
and help message, and other features.
To create a CSV utility, you specify a "name" (e.g. "csv_dump"; must be
a valid unqualified Perl identifier/function name) and optionally
"summary", "description", and other metadata like "links" or even
"add_meta_props". Then you specify one or more of "on_*" or "before_*"
or "after_*" arguments to supply handlers (coderefs) for your CSV
utility at various hook points.
*THE HOOKS*
All code for hooks should accept a single argument "r". "r" is a stash
(hashref) of various data, the keys of which will depend on which hook
point being called. You can also add more keys to store data or for flow
control (see hook documentation below for more details).
The order of the hooks, in processing chronological order:
* on_begin
Called when utility begins, before reading CSV. You can use this
hook e.g. to process arguments, set output filenames (if you allow
custom output filenames).
* before_read_input
Called before opening any input CSV file. This hook is *still*
called even if your utility sets "reads_csv" to false.
At this point, the "input_filenames" stash key (as well as other
keys like "input_filename", "input_filenum", etc) has not been set.
You can use this hook e.g. to set a custom "input_filenames".
* before_open_input_files
Called before an input CSV file is about to be opened, including for
stdin ("-"). You can use this hook e.g. to check/preprocess input
file. Flow control is available by setting "$r->{wants_skip_files}"
to skip reading all the input files and go directly to the
"after_read_input" hook.
* before_open_input_file
Called before each input CSV file is about to be opened, including
for stdin ("-"). For the first file, called after
"before_open_input_file" hook. You can use this hook e.g. to
check/preprocess input file. Flow control is available by setting
"$r->{wants_skip_file}" to skip reading a single input file and go
to the next file, or "$r->{wants_skip_files}" to skip reading the
rest of the files and go directly to the "after_read_input" hook.
* on_input_header_row
Called when receiving header row. Will be called for every input
file, and called even when user specify "--no-input-header", in
which case the header row will be the generated "["field1",
"field2", ...]". You can use this hook e.g. to add/remove/rearrange
fields.
You can set "$r->{wants_fill_rows}" to a defined false if you do not
want "$r->{input_rows}" to be filled with empty string elements when
it contains less than the number of fields (in case of sparse values
at the end). Normally you only want to do this when you want to do
checking, e.g. in csv-check-rows.
* on_input_data_row
Called when receiving each data row. You can use this hook e.g. to
modify the row or print output (for line-by-line transformation or
filtering).
* after_close_input_file
Called after each input file is closed, including for stdin ("-")
(although for stdin, the handle is not actually closed). Flow
control is possible by setting "$r->{wants_skip_files}" to skip
reading the rest of the files and go straight to the
"after_close_input_files" hook.
* after_close_input_files
Called after the last input file is closed, after the last
"after_close_input_file" hook, including for stdin ("-") (although
for stdin, the handle is not actually closed).
* after_read_input
Called after the last row of the last CSV file is read and the last
file is closed. This hook is *still* called, if you set "reads_csv"
option to false. At this point the stash keys related to CSV reading
have all been cleared, including "input_filenames",
"input_filename", "input_fh", etc.
You can use this hook e.g. to print output if you buffer the output.
* on_end
Called when utility is about to exit. You can use this hook e.g. to
return the final result.
*THE STASH*
The common keys that "r" will contain:
* "gen_args", hash. The arguments used to generate the CSV utility.
* "util_args", hash. The arguments that your CSV utility accepts.
Parsed from command-line arguments (or configuration files, or
environment variables).
* "name", str. The name of the CSV utility. Which can also be
retrieved via "gen_args".
* "code_print", coderef. Routine provided for you to print something.
Accepts a string. Takes care of opening the output files for you.
* "code_print_row", coderef. Routine provided for you to print a data
row. You pass the row (either arrayref or hashref). Takes care of
opening the output files for you, as well as printing header row the
first time, if needed.
* "code_print_header_row", coderef. Routine provided for you to print
header row. You don't need to pass any arguments. Will only print
the header row once per output file if output header is enabled,
even if called multiple times.
If you are accepting CSV data ("reads_csv" gen argument set to true),
the following keys will also be available (in "on_input_header_row" and
"on_input_data_row" hooks):
* "input_parser", a Text::CSV_XS instance for input parsing.
* "input_fields", array of str. Input CSV's field names.
* "input_fields_idx", hash with field name as keys and field index
(0-based integer) as values.
* "input_filenames", array of str.
* "input_filename", str. The name of the current input file being read
("-" if reading from stdin).
* "input_filenum", uint. The number of the current input file, 1 being
the first file, 2 for the second, and so on.
* "input_fh", the handle to the current file being read.
* "input_rownum", uint. The number of rows that have been read (reset
after each input file). In "on_input_header_row" phase, this will be
1 since header row (including the generated one) is the first row.
Then in "on_input_data_row" phase (called the first time for a
file), it will be 2 for the first data row, even if physically it is
the first row for CSV file that does not have a header.
* "input_data_rownum", uint. The number of data rows that have been
read (reset after each input file). This will be equal to
"input_rownum" less 1 if input file has header.
* "input_row", aos (array of str). The current input CSV row as an
arrayref.
* "input_row_as_hashref", hos (hash of str). The current input CSV row
as a hashref, with field names as hash keys and field values as hash
values. This will only be calculated if utility wants it. Utility
can express so by setting "$r->{wants_input_row_as_hashref}" to
true, e.g. in the "on_begin" hook.
* "input_header_row_count", uint. Contains the number of actual header
rows that have been read. If CLI user specifies "--no-input-header",
this will stay at zero. Will be reset for each CSV file.
* "input_data_row_count", int. Contains the number of actual data rows
that have read. Will be reset for each CSV file.
If you are outputting CSV ("writes_csv" gen argument set to true), the
following keys will be available:
* "output_emitter", a Text::CSV_XS instance for output.
* "output_fields", array of str. Should be set to list of output field
names. If unset, will be set to be the same as "input_fields".
* "output_fields_idx", hash with field names as keys and field indexes
(0-based integer) as values. Normally you do not need to set this
manually; you just need to set "output_fields" and this hash will be
computed automatically for you just before the first output row is
outputted.
* "output_filenames", array of str.
* "output_filename", str, name of current output file.
* "output_filenum", uint, the number of the current output file, 1
being the first file, 2 for the second, and so on.
* "output_fh", handle to the current output file.
* "output_rownum", uint. The number of rows that have been outputted
(reset after each output file).
* "output_data_rownum", uint. The number of data rows that have been
outputted (reset after each output file). This will be equal to
"input_rownum" less 1 if input file has header.
For other hook-specific keys, see the documentation for associated hook
point.
*ACCEPTING ADDITIONAL COMMAND-LINE OPTIONS/ARGUMENTS*
As mentioned above, you will get additional command-line
options/arguments in "$r->{util_args}" hashref. Some options/arguments
are already added by "gen_csv_util", e.g. "input_filename" or
"input_filenames" along with "input_sep_char", etc (when your utility
declares "reads_csv"), "output_filename" or "output_filenames" along
with "overwrite", "output_sep_char", etc (when your utility declares
"writes_csv").
If you want to accept additional arguments/options, you specify them in
"add_args" (hashref, with key being Each option/argument has to be
specified first via "add_args" (as hashref, with key being argument name
and value the argument specification as defined in Rinci::function)).
Some argument specifications have been defined in App::CSVUtils and can
be used. See existing utilities for examples.
*READING CSV DATA*
To read CSV data, normally your utility would provide handler for the
"on_input_data_row" hook and sometimes additionally
"on_input_header_row".
*OUTPUTTING STRING OR RETURNING RESULT*
To output string, usually you call the provided routine
"$r->{code_print}". This routine will open the output files for you.
You can also return enveloped result directly by setting "$r->{result}".
*OUTPUTTING CSV DATA*
To output CSV data, usually you call the provided routine
"$r->{code_print_row}". This routine accepts a row (arrayref or
hashref). This routine will open the output files for you when needed,
as well as print header row automatically.
You can also buffer rows from input to e.g. "$r->{output_rows}", then
call "$r->{code_print_row}" repeatedly in the "after_read_input" hook to
print all the rows.
*READING MULTIPLE CSV FILES*
To read multiple CSV files, you first specify "reads_multiple_csv".
Then, you can supply handler for "on_input_header_row" and
"on_input_data_row" as usual. If you want to do something before/after
each input file, you can also supply handler for
"before_open_input_file" or "after_close_input_file".
*WRITING TO MULTIPLE CSV FILES*
Similarly, to write to many CSv files, you first specify
"writes_multiple_csv". Then, you can supply handler for
"on_input_header_row" and "on_input_data_row" as usual. To switch to the
next file, set "$r->{wants_switch_to_next_output_file}" to true, in
which case the next call to "$r->{code_print_row}" will close the
current file and open the next file.
*CHANGING THE OUTPUT FIELDS*
When calling "$r->{code_print_row}", you can output whatever fields you
want. By convention, you can set "$r->{output_fields}" and
"$r->{output_fields_idx}" to let other handlers know about the output
fields. For example, see the implementation of csv-concat.
This function is not exported by default, but exportable.
Arguments ('*' denotes required arguments):
* add_args => *hash*
(No description)
* add_args_rels => *hash*
(No description)
* add_meta_props => *hash*
Add additional Rinci function metadata properties.
* after_close_input_file => *code*
(No description)
* after_close_input_files => *code*
(No description)
* after_read_input => *code*
(No description)
* before_open_input_file => *code*
(No description)
* before_open_input_files => *code*
(No description)
* before_read_input => *code*
(No description)
* description => *str*
(No description)
* examples => *array*
(No description)
* links => *array[hash]*
(No description)
* name* => *perl::identifier::unqualified_ascii*
(No description)
* on_begin => *code*
(No description)
* on_end => *code*
(No description)
* on_input_data_row => *code*
(No description)
* on_input_header_row => *code*
(No description)
* reads_csv => *bool* (default: 1)
Whether utility reads CSV data.
* reads_multiple_csv => *bool*
Whether utility accepts CSV data.
Setting this option to true will implicitly set the "reads_csv"
option to true, obviously.
* summary => *str*
(No description)
* writes_csv => *bool* (default: 1)
Whether utility writes CSV data.
* writes_multiple_csv => *bool*
Whether utility outputs CSV data.
Setting this option to true will implicitly set the "writes_csv"
option to true, obviously.
Return value: (bool)
compile_eval_code
Usage:
$coderef = compile_eval_code($str, $label);
Compile string code $str to coderef in 'main' package, without "use
strict" or "use warnings". Die on compile error.
eval_code
Usage:
$res = eval_code($coderef, $r, $topic_var_value, $return_topic_var);
FAQ
My CSV does not have a header?
Use the "--no-header" option. Fields will be named "field1", "field2",
and so on.
My data is TSV, not CSV?
Use the "--tsv" option.
I have a big CSV and the utilities are too slow or eat too much RAM!
These utilities are not (yet) optimized, patches welcome. If your CSV is
very big, perhaps a C-based solution is what you need.
HOMEPAGE
Please visit the project's homepage at
SOURCE
Source repository is at
SEE ALSO
Similar CLI bundles for other format
App::TSVUtils, App::LTSVUtils, App::SerializeUtils.
Other CSV-related utilities
xls2csv and xlsx2csv from Spreadsheet::Read
import-csv-to-sqlite from App::SQLiteUtils
Query CSV with SQL using fsql from App::fsql
csvgrep from csvgrep
Other non-Perl-based CSV utilities
Python
AUTHOR
perlancar <perlancar@cpan.org>
CONTRIBUTOR
Adam Hopkins <violapiratejunky@gmail.com>
CONTRIBUTING
To contribute, you can send patches by email/via RT, or send pull
requests on GitHub.
Most of the time, you don't need to build the distribution yourself. You
can simply modify the code, then test via:
% prove -l
If you want to build the distribution (e.g. to try to install it locally
on your system), you can install Dist::Zilla,
Dist::Zilla::PluginBundle::Author::PERLANCAR,
Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two
other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps
required beyond that are considered a bug and can be reported to me.
COPYRIGHT AND LICENSE
This software is copyright (c) 2025 by perlancar <perlancar@cpan.org>.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
BUGS
Please report any bugs or feature requests on the bugtracker website
When submitting a bug or request, please include a test-file or a patch
to an existing test-file that illustrates the bug or desired feature.