README - metacpan.org


            
              1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
              NAME
    App::CSVUtils - CLI utilities related to CSV
VERSION
    This document describes version 1.036 of App::CSVUtils (from Perl
    distribution App-CSVUtils), released on 2025-02-04.
DESCRIPTION
    This distribution contains the following CLI utilities:
    1. csv-add-fields
    2. csv-avg
    3. csv-check-cell-values
    4. csv-check-field-names
    5. csv-check-field-values
    6. csv-check-rows
    7. csv-cmp
    8. csv-concat
    9. csv-convert-to-hash
    10. csv-csv
    11. csv-delete-fields
    12. csv-dump
    13. csv-each-row
    14. csv-fill-cells
    15. csv-fill-template
    16. csv-find-values
    17. csv-freqtable
    18. csv-gen
    19. csv-get-cells
    20. csv-grep
    21. csv-info
    22. csv-intrange
    23. csv-list-field-names
    24. csv-lookup-fields
    25. csv-ltrim
    26. csv-map
    27. csv-munge-field
    28. csv-munge-rows
    29. csv-pick
    30. csv-pick-cells
    31. csv-pick-fields
    32. csv-pick-rows
    33. csv-quote
    34. csv-replace-newline
    35. csv-rtrim
    36. csv-select-fields
    37. csv-select-rows
    38. csv-setop
    39. csv-shuf
    40. csv-shuf-fields
    41. csv-shuf-rows
    42. csv-sort
    43. csv-sort-fields
    44. csv-sort-fields-by-example
    45. csv-sort-fields-by-spec
    46. csv-sort-rows
    47. csv-sorted
    48. csv-sorted-fields
    49. csv-sorted-rows
    50. csv-split
    51. csv-sum
    52. csv-transpose
    53. csv-trim
    54. csv-uniq
    55. csv-unquote
    56. csv2csv
    57. csv2ltsv
    58. csv2paras
    59. csv2td
    60. csv2tsv
    61. csv2vcf
    62. list-csvutils
    63. paras2csv
    64. tsv2csv
FUNCTIONS
  gen_csv_util
    Usage:
     gen_csv_util(%args) -> bool
    Generate a CSV utility.
    This routine is used to generate a CSV utility in the form of a Rinci
    function (code and metadata). You can then produce a CLI from the Rinci
    function simply using Perinci::CmdLine::Gen or, if you use Dist::Zilla,
    Dist::Zilla::Plugin::GenPericmdScript or, if on the command-line,
    gen-pericmd-script.
    Using this routine, by providing just one or a few hooks and setting
    some parameters like a couple of extra arguments, you will get a
    complete CLI with decent POD/manpage, ability to read one or multiple
    CSV's and write one or multiple CSV's, some command-line options to
    customize how the input CSV's should be parsed and how the output CSV's
    should be formatted and named. Your CLI also has tab completion, usage
    and help message, and other features.
    To create a CSV utility, you specify a "name" (e.g. "csv_dump"; must be
    a valid unqualified Perl identifier/function name) and optionally
    "summary", "description", and other metadata like "links" or even
    "add_meta_props". Then you specify one or more of "on_*" or "before_*"
    or "after_*" arguments to supply handlers (coderefs) for your CSV
    utility at various hook points.
    *THE HOOKS*
    All code for hooks should accept a single argument "r". "r" is a stash
    (hashref) of various data, the keys of which will depend on which hook
    point being called. You can also add more keys to store data or for flow
    control (see hook documentation below for more details).
    The order of the hooks, in processing chronological order:
    *   on_begin
        Called when utility begins, before reading CSV. You can use this
        hook e.g. to process arguments, set output filenames (if you allow
        custom output filenames).
    *   before_read_input
        Called before opening any input CSV file. This hook is *still*
        called even if your utility sets "reads_csv" to false.
        At this point, the "input_filenames" stash key (as well as other
        keys like "input_filename", "input_filenum", etc) has not been set.
        You can use this hook e.g. to set a custom "input_filenames".
    *   before_open_input_files
        Called before an input CSV file is about to be opened, including for
        stdin ("-"). You can use this hook e.g. to check/preprocess input
        file. Flow control is available by setting "$r->{wants_skip_files}"
        to skip reading all the input files and go directly to the
        "after_read_input" hook.
    *   before_open_input_file
        Called before each input CSV file is about to be opened, including
        for stdin ("-"). For the first file, called after
        "before_open_input_file" hook. You can use this hook e.g. to
        check/preprocess input file. Flow control is available by setting
        "$r->{wants_skip_file}" to skip reading a single input file and go
        to the next file, or "$r->{wants_skip_files}" to skip reading the
        rest of the files and go directly to the "after_read_input" hook.
    *   on_input_header_row
        Called when receiving header row. Will be called for every input
        file, and called even when user specify "--no-input-header", in
        which case the header row will be the generated "["field1",
        "field2", ...]". You can use this hook e.g. to add/remove/rearrange
        fields.
        You can set "$r->{wants_fill_rows}" to a defined false if you do not
        want "$r->{input_rows}" to be filled with empty string elements when
        it contains less than the number of fields (in case of sparse values
        at the end). Normally you only want to do this when you want to do
        checking, e.g. in csv-check-rows.
    *   on_input_data_row
        Called when receiving each data row. You can use this hook e.g. to
        modify the row or print output (for line-by-line transformation or
        filtering).
    *   after_close_input_file
        Called after each input file is closed, including for stdin ("-")
        (although for stdin, the handle is not actually closed). Flow
        control is possible by setting "$r->{wants_skip_files}" to skip
        reading the rest of the files and go straight to the
        "after_close_input_files" hook.
    *   after_close_input_files
        Called after the last input file is closed, after the last
        "after_close_input_file" hook, including for stdin ("-") (although
        for stdin, the handle is not actually closed).
    *   after_read_input
        Called after the last row of the last CSV file is read and the last
        file is closed. This hook is *still* called, if you set "reads_csv"
        option to false. At this point the stash keys related to CSV reading
        have all been cleared, including "input_filenames",
        "input_filename", "input_fh", etc.
        You can use this hook e.g. to print output if you buffer the output.
    *   on_end
        Called when utility is about to exit. You can use this hook e.g. to
        return the final result.
    *THE STASH*
    The common keys that "r" will contain:
    *   "gen_args", hash. The arguments used to generate the CSV utility.
    *   "util_args", hash. The arguments that your CSV utility accepts.
        Parsed from command-line arguments (or configuration files, or
        environment variables).
    *   "name", str. The name of the CSV utility. Which can also be
        retrieved via "gen_args".
    *   "code_print", coderef. Routine provided for you to print something.
        Accepts a string. Takes care of opening the output files for you.
    *   "code_print_row", coderef. Routine provided for you to print a data
        row. You pass the row (either arrayref or hashref). Takes care of
        opening the output files for you, as well as printing header row the
        first time, if needed.
    *   "code_print_header_row", coderef. Routine provided for you to print
        header row. You don't need to pass any arguments. Will only print
        the header row once per output file if output header is enabled,
        even if called multiple times.
    If you are accepting CSV data ("reads_csv" gen argument set to true),
    the following keys will also be available (in "on_input_header_row" and
    "on_input_data_row" hooks):
    *   "input_parser", a Text::CSV_XS instance for input parsing.
    *   "input_fields", array of str. Input CSV's field names.
    *   "input_fields_idx", hash with field name as keys and field index
        (0-based integer) as values.
    *   "input_filenames", array of str.
    *   "input_filename", str. The name of the current input file being read
        ("-" if reading from stdin).
    *   "input_filenum", uint. The number of the current input file, 1 being
        the first file, 2 for the second, and so on.
    *   "input_fh", the handle to the current file being read.
    *   "input_rownum", uint. The number of rows that have been read (reset
        after each input file). In "on_input_header_row" phase, this will be
        1 since header row (including the generated one) is the first row.
        Then in "on_input_data_row" phase (called the first time for a
        file), it will be 2 for the first data row, even if physically it is
        the first row for CSV file that does not have a header.
    *   "input_data_rownum", uint. The number of data rows that have been
        read (reset after each input file). This will be equal to
        "input_rownum" less 1 if input file has header.
    *   "input_row", aos (array of str). The current input CSV row as an
        arrayref.
    *   "input_row_as_hashref", hos (hash of str). The current input CSV row
        as a hashref, with field names as hash keys and field values as hash
        values. This will only be calculated if utility wants it. Utility
        can express so by setting "$r->{wants_input_row_as_hashref}" to
        true, e.g. in the "on_begin" hook.
    *   "input_header_row_count", uint. Contains the number of actual header
        rows that have been read. If CLI user specifies "--no-input-header",
        this will stay at zero. Will be reset for each CSV file.
    *   "input_data_row_count", int. Contains the number of actual data rows
        that have read. Will be reset for each CSV file.
    If you are outputting CSV ("writes_csv" gen argument set to true), the
    following keys will be available:
    *   "output_emitter", a Text::CSV_XS instance for output.
    *   "output_fields", array of str. Should be set to list of output field
        names. If unset, will be set to be the same as "input_fields".
    *   "output_fields_idx", hash with field names as keys and field indexes
        (0-based integer) as values. Normally you do not need to set this
        manually; you just need to set "output_fields" and this hash will be
        computed automatically for you just before the first output row is
        outputted.
    *   "output_filenames", array of str.
    *   "output_filename", str, name of current output file.
    *   "output_filenum", uint, the number of the current output file, 1
        being the first file, 2 for the second, and so on.
    *   "output_fh", handle to the current output file.
    *   "output_rownum", uint. The number of rows that have been outputted
        (reset after each output file).
    *   "output_data_rownum", uint. The number of data rows that have been
        outputted (reset after each output file). This will be equal to
        "input_rownum" less 1 if input file has header.
    For other hook-specific keys, see the documentation for associated hook
    point.
    *ACCEPTING ADDITIONAL COMMAND-LINE OPTIONS/ARGUMENTS*
    As mentioned above, you will get additional command-line
    options/arguments in "$r->{util_args}" hashref. Some options/arguments
    are already added by "gen_csv_util", e.g. "input_filename" or
    "input_filenames" along with "input_sep_char", etc (when your utility
    declares "reads_csv"), "output_filename" or "output_filenames" along
    with "overwrite", "output_sep_char", etc (when your utility declares
    "writes_csv").
    If you want to accept additional arguments/options, you specify them in
    "add_args" (hashref, with key being Each option/argument has to be
    specified first via "add_args" (as hashref, with key being argument name
    and value the argument specification as defined in Rinci::function)).
    Some argument specifications have been defined in App::CSVUtils and can
    be used. See existing utilities for examples.
    *READING CSV DATA*
    To read CSV data, normally your utility would provide handler for the
    "on_input_data_row" hook and sometimes additionally
    "on_input_header_row".
    *OUTPUTTING STRING OR RETURNING RESULT*
    To output string, usually you call the provided routine
    "$r->{code_print}". This routine will open the output files for you.
    You can also return enveloped result directly by setting "$r->{result}".
    *OUTPUTTING CSV DATA*
    To output CSV data, usually you call the provided routine
    "$r->{code_print_row}". This routine accepts a row (arrayref or
    hashref). This routine will open the output files for you when needed,
    as well as print header row automatically.
    You can also buffer rows from input to e.g. "$r->{output_rows}", then
    call "$r->{code_print_row}" repeatedly in the "after_read_input" hook to
    print all the rows.
    *READING MULTIPLE CSV FILES*
    To read multiple CSV files, you first specify "reads_multiple_csv".
    Then, you can supply handler for "on_input_header_row" and
    "on_input_data_row" as usual. If you want to do something before/after
    each input file, you can also supply handler for
    "before_open_input_file" or "after_close_input_file".
    *WRITING TO MULTIPLE CSV FILES*
    Similarly, to write to many CSv files, you first specify
    "writes_multiple_csv". Then, you can supply handler for
    "on_input_header_row" and "on_input_data_row" as usual. To switch to the
    next file, set "$r->{wants_switch_to_next_output_file}" to true, in
    which case the next call to "$r->{code_print_row}" will close the
    current file and open the next file.
    *CHANGING THE OUTPUT FIELDS*
    When calling "$r->{code_print_row}", you can output whatever fields you
    want. By convention, you can set "$r->{output_fields}" and
    "$r->{output_fields_idx}" to let other handlers know about the output
    fields. For example, see the implementation of csv-concat.
    This function is not exported by default, but exportable.
    Arguments ('*' denotes required arguments):
    *   add_args => *hash*
        (No description)
    *   add_args_rels => *hash*
        (No description)
    *   add_meta_props => *hash*
        Add additional Rinci function metadata properties.
    *   after_close_input_file => *code*
        (No description)
    *   after_close_input_files => *code*
        (No description)
    *   after_read_input => *code*
        (No description)
    *   before_open_input_file => *code*
        (No description)
    *   before_open_input_files => *code*
        (No description)
    *   before_read_input => *code*
        (No description)
    *   description => *str*
        (No description)
    *   examples => *array*
        (No description)
    *   links => *array[hash]*
        (No description)
    *   name* => *perl::identifier::unqualified_ascii*
        (No description)
    *   on_begin => *code*
        (No description)
    *   on_end => *code*
        (No description)
    *   on_input_data_row => *code*
        (No description)
    *   on_input_header_row => *code*
        (No description)
    *   reads_csv => *bool* (default: 1)
        Whether utility reads CSV data.
    *   reads_multiple_csv => *bool*
        Whether utility accepts CSV data.
        Setting this option to true will implicitly set the "reads_csv"
        option to true, obviously.
    *   summary => *str*
        (No description)
    *   writes_csv => *bool* (default: 1)
        Whether utility writes CSV data.
    *   writes_multiple_csv => *bool*
        Whether utility outputs CSV data.
        Setting this option to true will implicitly set the "writes_csv"
        option to true, obviously.
    Return value: (bool)
  compile_eval_code
    Usage:
     $coderef = compile_eval_code($str, $label);
    Compile string code $str to coderef in 'main' package, without "use
    strict" or "use warnings". Die on compile error.
  eval_code
    Usage:
     $res = eval_code($coderef, $r, $topic_var_value, $return_topic_var);
FAQ
  My CSV does not have a header?
    Use the "--no-header" option. Fields will be named "field1", "field2",
    and so on.
  My data is TSV, not CSV?
    Use the "--tsv" option.
  I have a big CSV and the utilities are too slow or eat too much RAM!
    These utilities are not (yet) optimized, patches welcome. If your CSV is
    very big, perhaps a C-based solution is what you need.
HOMEPAGE
    Please visit the project's homepage at
    <https://metacpan.org/release/App-CSVUtils>.
SOURCE
    Source repository is at
    <https://github.com/perlancar/perl-App-CSVUtils>.
SEE ALSO
  Similar CLI bundles for other format
    App::TSVUtils, App::LTSVUtils, App::SerializeUtils.
  Other CSV-related utilities
    xls2csv and xlsx2csv from Spreadsheet::Read
    import-csv-to-sqlite from App::SQLiteUtils
    Query CSV with SQL using fsql from App::fsql
    csvgrep from csvgrep
  Other non-Perl-based CSV utilities
   Python
    csvkit, <https://csvkit.readthedocs.io/en/latest/>
AUTHOR
    perlancar <perlancar@cpan.org>
CONTRIBUTOR
    Adam Hopkins <violapiratejunky@gmail.com>
CONTRIBUTING
    To contribute, you can send patches by email/via RT, or send pull
    requests on GitHub.
    Most of the time, you don't need to build the distribution yourself. You
    can simply modify the code, then test via:
     % prove -l
    If you want to build the distribution (e.g. to try to install it locally
    on your system), you can install Dist::Zilla,
    Dist::Zilla::PluginBundle::Author::PERLANCAR,
    Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two
    other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps
    required beyond that are considered a bug and can be reported to me.
COPYRIGHT AND LICENSE
    This software is copyright (c) 2025 by perlancar <perlancar@cpan.org>.
    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.
BUGS
    Please report any bugs or feature requests on the bugtracker website
    <https://rt.cpan.org/Public/Dist/Display.html?Name=App-CSVUtils>
    When submitting a bug or request, please include a test-file or a patch
    to an existing test-file that illustrates the bug or desired feature.
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)