NAME

nauniq - Non-adjacent uniq

VERSION

This document describes version 0.05 of nauniq (from Perl distribution App-nauniq), released on 2014-05-05.

SYNOPSIS

nauniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

nauniq is similar to the Unix command uniq but detects repeated lines even if they are not adjacent. To do this, nauniq must remember the lines being fed to it. There are options to control memory usage: option to only remember a certain number of unique lines, option to remember a certain number of characters for each line, and option to only remember the MD5 hash (instead of the content) of each line.

OPTIONS

  • --repeated, -d

    Print only duplicate lines. The opposite of --unique.

  • --ignore-case, -i

    Ignore case.

  • --num-entries=N

    Number of unique entries to remember. The default is -1 (unlimited). This option is to control memory usage, but the consequence is that lines that are too far apart will be forgotten.

  • --skip-chars=N, -s

    Number of characters from the beginning of line to skip when checking uniqueness.

  • --unique, -u

    Print only unique lines. This is the default. The opposite of --repeated.

  • --check-chars=N, -w

    The amount of characters to check for uniqueness. The default is -1 (check all characters in a line).

  • --append

    Open output file in append mode. See also -a.

  • -a

    Equivalent to --append --read-output.

  • --forget-pattern=S

    This is an alternative to --num-entries. Instead of instructing nauniq to remember only a fixed number of entries, you can specify a regex pattern to trigger the forgetting the lines. An example use-case of this is when you have a file like this:

    * entries for 2014-03-13
    foo
    bar
    baz
    * entries for 2014-03-14
    foo
    baz

    and you want unique lines for each day (in which you'll specify --forget-pattern '^\*').

  • --md5

    Remember the MD5 hash instead of the actual characters of the line. Might be useful to reduce memory usage if the lines are long.

  • --read-output

    Whether to read output file first. This option works only with --append and is usually used via -a to append lines to file if they do not exist yet in the file.

EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

FAQ

How do I append lines to a file only if they do not exist in the file?

You cannot do this with uniq:

% ( cat FILE ; produce-lines ) | uniq - FILE
% ( cat FILE ; produce-lines ) | uniq >> FILE

as it will clobber the file first. But you can do this with nauniq:

% produce-lines | nauniq -a - FILE

TODO

  • --record-separator

  • Support more uniq options

    --skip-fields (-f), --zero-terminated (-z).

  • Specify memory limit?

    Using Tie::Cache's MaxBytes option.

  • Debugging option: print memory usage at the end of run

  • Debugging option: print whenever forget pattern matches

SEE ALSO

uniq

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-nauniq.

SOURCE

Source repository is at https://github.com/sharyanto/perl-App-nauniq.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-nauniq

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.