NAME
nauniq - Non-adjacent uniq
VERSION
This document describes version 0.05 of nauniq (from Perl distribution App-nauniq), released on 2014-05-05.
SYNOPSIS
nauniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
nauniq
is similar to the Unix command uniq
but detects repeated lines even if they are not adjacent. To do this, nauniq
must remember the lines being fed to it. There are options to control memory usage: option to only remember a certain number of unique lines, option to remember a certain number of characters for each line, and option to only remember the MD5 hash (instead of the content) of each line.
OPTIONS
--repeated, -d
Print only duplicate lines. The opposite of
--unique
.--ignore-case, -i
Ignore case.
--num-entries=N
Number of unique entries to remember. The default is -1 (unlimited). This option is to control memory usage, but the consequence is that lines that are too far apart will be forgotten.
--skip-chars=N, -s
Number of characters from the beginning of line to skip when checking uniqueness.
--unique, -u
Print only unique lines. This is the default. The opposite of
--repeated
.--check-chars=N, -w
The amount of characters to check for uniqueness. The default is -1 (check all characters in a line).
--append
Open output file in append mode. See also
-a
.-a
Equivalent to
--append --read-output
.--forget-pattern=S
This is an alternative to
--num-entries
. Instead of instructingnauniq
to remember only a fixed number of entries, you can specify a regex pattern to trigger the forgetting the lines. An example use-case of this is when you have a file like this:* entries for 2014-03-13 foo bar baz * entries for 2014-03-14 foo baz
and you want unique lines for each day (in which you'll specify
--forget-pattern '^\*'
).--md5
Remember the MD5 hash instead of the actual characters of the line. Might be useful to reduce memory usage if the lines are long.
--read-output
Whether to read output file first. This option works only with
--append
and is usually used via-a
to append lines to file if they do not exist yet in the file.
EXIT CODES
0 on success.
255 on I/O error.
99 on command-line options error.
FAQ
How do I append lines to a file only if they do not exist in the file?
You cannot do this with uniq
:
% ( cat FILE ; produce-lines ) | uniq - FILE
% ( cat FILE ; produce-lines ) | uniq >> FILE
as it will clobber the file first. But you can do this with nauniq
:
% produce-lines | nauniq -a - FILE
TODO
--record-separator
Support more
uniq
options--skip-fields (-f), --zero-terminated (-z).
Specify memory limit?
Using Tie::Cache's MaxBytes option.
Debugging option: print memory usage at the end of run
Debugging option: print whenever forget pattern matches
SEE ALSO
HOMEPAGE
Please visit the project's homepage at https://metacpan.org/release/App-nauniq.
SOURCE
Source repository is at https://github.com/sharyanto/perl-App-nauniq.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-nauniq
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
AUTHOR
Steven Haryanto <stevenharyanto@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Steven Haryanto.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.