NAME
School::Code::Compare - 'naive' metrics for code similarity
VERSION
version 0.007
SYNOPSIS
This distribution ships a script. You migth want to look at the script compare-code in the bin
directory. For documentation of the used libraries, keep on reading.
This calculates the Levenshtein Difference for two files, if they meet certain criterias:
use School::Code::Compare;
my $comparer = School::Code::Compare->new()
->set_max_char_difference(400)
->set_min_char_total ( 20)
->set_max_distance (400);
my $comparison = $comparer->measure( 'use strict; print "Hello\n";',
'use v5.22; say "Hello";'
);
print $comparison->{distance} if $comparison # 13
FUNCTIONS
set_max_char_difference
Don't even start comparison, if the difference in char count is higher than set.
set_min_char_total
Don't even start comparison if a file is below this char count.
set_max_distance
Abort comparison (in the midst of comparison), if distance is becoming higher then set value.
measure
Do a comparison for two strings. Gives back a hash reference with different information:
# (example output from synopsis)
{
'distance' => 13,
'ratio' => 50,
'comment' => 'comparison done',
'delta_length' => 5
};
- distance
-
The Levenshtein Distance. See Text::Levenshtein::XS for more information.
- ratio
-
The ratio of the distance in chars to the average length of the compared strings. A ratio of zero means, the strings are similar. A ratio of 50 means, that 50% of a string is different.
My experience is, that if you get a ratio below 30% you have to start looking if the code was copied and altered (if your concern is to find 'cheaters' in educational/school environments). This method of measurement is by no means well established. It may be even 'naive', but it just seems to work out quite well. See School::Code::Compare::Judge to see, how the results are currently interpreted.
- comment
-
A comment on how the comparison went.
- delta_length
-
Difference in length (chars) of the two compared strings.
AUTHOR
Boris Däppen <bdaeppen.perl@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2019 by Boris Däppen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.