NAME

School::Code::Compare - 'naive' metrics for code similarity

VERSION

version 0.007

SYNOPSIS

This distribution ships a script. You migth want to look at the script compare-code in the bin directory. For documentation of the used libraries, keep on reading.

This calculates the Levenshtein Difference for two files, if they meet certain criterias:

use School::Code::Compare;

my $comparer   = School::Code::Compare->new()                                    
                                      ->set_max_char_difference(400)             
                                      ->set_min_char_total     ( 20)             
                                      ->set_max_distance       (400);

my $comparison = $comparer->measure( 'use strict; print "Hello\n";',
                                     'use v5.22; say "Hello";'
                                   ); 

print $comparison->{distance} if $comparison   # 13

FUNCTIONS

set_max_char_difference

Don't even start comparison, if the difference in char count is higher than set.

set_min_char_total

Don't even start comparison if a file is below this char count.

set_max_distance

Abort comparison (in the midst of comparison), if distance is becoming higher then set value.

measure

Do a comparison for two strings. Gives back a hash reference with different information:

# (example output from synopsis)
{
  'distance'     => 13,
  'ratio'        => 50,
  'comment'      => 'comparison done',
  'delta_length' => 5
};
distance

The Levenshtein Distance. See Text::Levenshtein::XS for more information.

ratio

The ratio of the distance in chars to the average length of the compared strings. A ratio of zero means, the strings are similar. A ratio of 50 means, that 50% of a string is different.

My experience is, that if you get a ratio below 30% you have to start looking if the code was copied and altered (if your concern is to find 'cheaters' in educational/school environments). This method of measurement is by no means well established. It may be even 'naive', but it just seems to work out quite well. See School::Code::Compare::Judge to see, how the results are currently interpreted.

comment

A comment on how the comparison went.

delta_length

Difference in length (chars) of the two compared strings.

AUTHOR

Boris Däppen <bdaeppen.perl@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Boris Däppen.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.