NAME
compare-code - find files with similar code
VERSION
version 0.201
SYNOPSIS
This program is developed in an education/school environment. It's purpose is to help detect similiarities in the code of IT projects, and therefore making assessments (more) fair.
The script compares files containing source code (or any plain text) to each other. The general approach for comparison is: whitespace and comments are always removed (see --help
for more), then the comparison is done using the Levenshtein algorithm. Future releases may bring more sophisticated techniques.
This program is written in the Perl Programming Language.
If you are unfamiliar with GNU/Linux you might want to read doc::Windows in the doc directory.
Example Usage
compare-code ./lib -i perl
compare-code -i cpp -f list_of_filepaths.txt -o html -p
find path/to/projects -type f -name Cow.java | compare-code -i java
Options
compare-code [DIR...] [OPTIONS...]
Arguments:
DIR analyse files in
given
directory
Input can otherwise also be specified over:
- the option --file / -f
- STDIN, receiving filepaths (e.g. from a find command)
Options:
--all, -a show all results in output
Don't hide skipped comparisons.
Will somethimes cause a lot of output.
--basedir, -b skip comparisons within projects under base directory
Folders one below will be seen as project directories.
Files inside projects will not be compared
with
each
other.
(This will currently not work on Windows)
--charset, -c chars used
for
comparison
Define one or more subsets of chars, used to compare the files:
- visibles
all chars without witespace
- numsignes (
default
)
like visibles, but words ignored in meaning (but not in position)
- signes
only special chars,
no
words or numbers
--file, -f file to
read
from (containing filepaths)
--help, -h show this manual
--in, -i input
format
, optimize
for
language
Comments get stripped from code.
Supportet arguments:
- hashy: python, perl, bash
- slashy: php, js, java, cpp, cs, c
- html, xml
- txt (
default
,
no
effect)
--mime, -m only compare
if
same MIME-type
This options needs the Perl Library File::LibMagic installed.
You will also need libmagic development files on your
system
.
--out, -o output
format
You can define an output
format
:
- html
- tab (
default
)
- csv
--persist, -p
result to file (instead STDOUT)
Saved in
local
directory
with
name pattern:
- comparison_[year-month-day-hour-minute]_[method].[
format
]
--
sort
, -s
sort
data by line
before
comparison
Useful to ignore order of method declaration.
See --
split
if
you need to
sort
by something
else
then by line.
--
split
, -t Split files on something
else
then newline
Use this option together
with
--
sort
.
--verbose, -v show actually compared data on STDERR
--yes, -y Don't prompt
for
questions
Program will start working without further confirmation.
(Answer all user prompts
with
[yes])
AUTHOR
Boris Däppen <bdaeppen.perl@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2023 by Boris Däppen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.