NAME

freqtable - Print frequency table of lines/words/characters/bytes/numbers

VERSION

This document describes version 0.010 of freqtable (from Perl distribution App-freqtable), released on 2025-08-03.

SYNOPSIS

% freqtable [OPTIONS] < INPUT

Sample input:

% cat input-lines.txt
one
one
two
three
four
five
five
five
six
seven
eight
eight
nine

% cat input-words.txt
one one two three four five five five six seven eight eight nine

% cat input-nums.txt
9.99 cents
9.99 dollars
9 cents

Modes

Display frequency table (by default: lines):

% freqtable input-lines.txt
3       five
2       eight
2       one
1       four
1       nine
1       seven
1       six
1       three
1       two

Display frequency table (words):

% freqtable -w input-words.txt
3       five
2       eight
2       one
1       four
1       nine
1       seven
1       six
1       three
1       two

Display frequency table (characters):

% freqtable -c input-words.txt
12
12      e
 7      i
 5      n
 4      f
 4      o
 4      t
 4      v
 3      h
 2      g
 2      r
 2      s
 1

 1      u
 1      w
 1      x

Display frequency table (nums):

% freqtable -n input-nums.txt
2      9.99
1      9

Display frequency table (integers):

% freqtable -i input-nums.txt
3      9

Formatting the output line: omitting the frequency (-F option)

Don't display the frequencies:

% freqtable -F input-lines.txt
five
eight
one
four
nine
seven
six
three
two

Formatting the output line: showing the percentages (`--percent`, `-p` option)

The default is to show frequencies as numbers:

% freqtable input-lines.txt
        3 five
...

You can display frequencies as percent instead:

% freqtable -p input-lines.txt
 23.08% five
...

Specify another `-p` if you want to display frequencies as integers as well as percent:

% freqtable -pp input-lines.txt
        3  23.08% five
...

Formatting the output line: custom formatting (`--format` option)

% freqtable --format '%04d: %s' input-lines.txt
0003: five

Filter by rank

Only display the top 3 ranks:

% freqtable input-lines.txt -r -3
% freqtable input-lines.txt -r 1-3
        3 five
        2 eight
        2 one

Sorting

Instead of the default sorting by frequency (descending order), if you specify --sort-sub (and optionally one or more --sort-arg) you can sort by the keys using one of Sort::Sub::* subroutines. Examples:

# sort by keys, asciibetically
% freqtable -F input-lines.txt --sort-sub asciibetically
2       eight
3       five
1       four
1       nine
2       one
1       seven
1       six
1       three
1       two

# sort by keys, asciibetically (descending order)
% freqtable -F input-lines.txt --sort-sub 'asciibetically<r>'
1       two
1       three
1       six
1       seven
2       one
1       nine
1       four
3       five
2       eight

# sort by keys, randomly using perl code (essentially, shuffling)
% freqtable -F input-lines.txt --sort-sub 'by_perl_code' --sort-arg 'code=int(rand()*3)-1'
3       five
1       three
2       eight
1       seven
2       one
1       six
1       nine
1       two
1       four

Running table (`--output-every` option)

If you have streaming input, you can instruct `freqtable` to print the result periodically after a number of input lines/words/characters/bytes. You can also instruct to clear the terminal screen before every output (`--clear-before-output`).

% perl -MArray::Sample::WeightedRandom=sample_weighted_random_with_replacement \
    -E'say sample_weighted_random_with_replacement(
         [ ["a", 1], ["b", 2], ["c", 3], ["d",5] ], 1) while 1' | \
  freqtable --output-every 10000 --clear --percent

Sample output:

45.43%  d
27.28%  c
18.20%  b
 9.10%  a

DESCRIPTION

This utility counts the occurences of lines (or words/characters) in the input then display each unique lines along with their number of occurrences. You can also instruct it to only show lines that have a specified number of occurrences.

You can use the following Unix command to count occurences of lines:

% sort input-lines.txt | uniq -c | sort -nr

and with a bit more work you can also use a combination of existing Unix commands to count occurrences of words/characters, as well as filter items that have a specified number of occurrences; freqtable basically offers convenience.

EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

OPTIONS

  • --bytes, -c

  • --chars, -m

  • --words, -w

  • --lines, -l

  • --number, -n

    Treat each line as a number. A line like this:

    9.99 cents

    will be regarded as:

    9.99
  • --integer, -i

    Treat each line as an integer. A line like this:

    9.99 cents

    will be regarded as:

    9
  • --ignore-case, -f

  • --no-print-freq, -F

    Will not print the frequencies.

  • --print-total, -t

    Print the total line at the bottom.

  • --no-print-total, -T

    Do not print the total line at the bottom (the default).

  • --rank=s, -r

    Filter by rank. There are several ways you can do this:

    -N to only display the top N ranks.

    N to only display the N'th rank.

    M-N to only display the M'th to N'th rank.

    M- to only display the M'th rank and lower items.

  • --sort-sub=s

    This will cause freqtable to sort by key name instead of by frequencies. You pass this option to specify a Sort::Sub routine, which is the name of a Sort::Sub::* module without the Sort::Sub:: prefix, e.g. asciibetically. The name can optionally be followed by <i>, or <r>, or <ir> to mean case-insensitive sorting, reverse order, and reverse order case-insensitive sorting, respectively. When you use one of these suffixes on the command-line, remember to quote since < and > can be intereprted by shell.

    Examples:

    asciibetically
    asciibetically<i>
    by_length<r>
  • --sort-arg=ARGNAME=ARGVALUE

    Pass argument(s) to the sort subroutine. Can be specified multiple times, once for every argument.

  • -a

    Shortcut for --sort=asciibetically.

  • --percent, -p

    Show frequencies as percentages instead of integers. If you specify this option one more time, will show frequencies as integers as well as percentages.

  • --format=s

    Format frequency line using `sprintf()` template. `freqtable` will supply these arguments after the template: frequency integer, item string, and frequency as percent. For example:

    %04d: %s              # sample output: 0004: five

    If you want to display the item first, you can use something like:

    %2$-12s: %d
    # sample output:
    five        : 3
    eight       : 2
  • --output-every=i

    If set, then after every specified number of input data (bytes/characters/words/lines), will output the "running" (current) frequency table.

  • --clear-before-output

    Emit ANSI escape codes "\033[2J" before each output to clear the screen.

FAQ

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-freqtable.

SOURCE

Source repository is at https://github.com/perlancar/perl-App-freqtable.

SEE ALSO

Unix commands wc, sort, uniq

wordstat from App::wordstat

csv-freqtable from App::CSVUtils

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

% prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2025 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-freqtable

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.