NAME
School::Code::Compare - 'naive' metrics for code similarity
VERSION
version 0.2
SYNOPSIS
This distribution ships a script. You migth want to look at the script compare-code in the bin
directory. For documentation of the used libraries, keep on reading.
This calculates the Levenshtein Difference for two files, if they meet certain criterias:
use School::Code::Compare;
my $comparer = School::Code::Compare->new()
->set_max_relative_difference(2)
->set_min_char_total (20)
->set_max_relative_distance(0.8);
my $comparison = $comparer->measure('use v5.22; say "Hi"!',
'use v5.22; say "Hello";'
);
print $comparison->{distance} if $comparison
FUNCTIONS
set_max_char_difference
Don't even start comparison, if the difference in char count is higher than set.
set_min_char_total
Don't even start comparison if a file is below this char count.
set_max_distance
Abort comparison (in the midst of comparison), if distance is becoming higher then set value.
measure
Do a comparison for two strings. Gives back a hash reference with different information:
# (example output from synopsis)
{
'comment' => 'comparison done',
'delta_length' => 3,
'distance' => 5
'length1' => 20,
'length2' => 23,
'ratio' => 79,
};
- distance
-
The Levenshtein Distance. See Text::Levenshtein::XS for more information.
- ratio
-
The ratio of the distance in chars to the average length of the compared strings. A ratio of zero means, the strings are similar. A ratio of 50 means, that 50% of a string is different.
My experience is, that if you get a ratio below 30% you have to start looking if the code was copied and altered (if your concern is to find 'cheaters' in educational/school environments). This method of measurement is by no means well established. It may be even 'naive', but it just seems to work out quite well. See School::Code::Compare::Judge to see, how the results are currently interpreted.
- comment
-
A comment on how the comparison went.
- delta_length
-
Difference in length (chars) of the two compared strings.
AUTHOR
Boris Däppen <bdaeppen.perl@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2023 by Boris Däppen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.