NAME

App::UniqFiles - Report or omit duplicate file contents

VERSION

version 0.01

SYNOPSIS

# See uniq-files script

DESCRIPTION

FUNCTIONS

None are exported, but they are exportable.

uniq_files(%args) -> [STATUS_CODE, ERR_MSG, RESULT]

Report or omit duplicate file contents.

Given a list of filenames, will check each file size and content for duplicate content. Interface is a bit like the `uniq` Unix command-line program.

Returns a 3-element arrayref. STATUS_CODE is 200 on success, or an error code between 3xx-5xx (just like in HTTP). ERR_MSG is a string containing error message, RESULT is the actual result.

Arguments (* denotes required arguments):

  • files* => array

  • count => bool (default 0)

    Return each file content's number of occurence.

    1 means the file content is only encountered once (unique), 2 means there is one duplicate, and so on.

  • report_duplicate => bool (default 0)

    Aliases: d (Alias for --noreport-unique --report-duplicate)

    Return duplicate items.

  • report_unique => bool (default 1)

    Aliases: u (Alias for --report-unique --noreport-duplicate)

    Return unique items.

TODO

  • Handle symlinks

    Provide options on how to handle symlinks: ignore them? Follow?

  • Handle special files (socket, pipe, device)

    Ignore them.

  • Check hardlinks/inodes first

    For fast checking.

  • Arguments hash_skip_bytes & hash_bytes

    For only checking uniqueness against parts of contents.

  • Arguments hash_module/hash_method/hash_sub

    For doing custom hashing instead of Digest::MD5.

AUTHOR

Steven Haryanto <stevenharyanto@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2011 by Steven Haryanto.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.