Changes for version 0.014 - 2022-07-08

  • isWORDCHAR_utf8_safe() / toLOWER_utf8_safe() are actually available since Perl v5.26 (Stanislaw Pusep)
  • eg/benchmark.pl improvements (Stanislaw Pusep)

Documentation

compute cosine similarity between two documents
uses MinHash & SpeedyFx to compare large text data
efficiently count unique tokens from a file

Modules

tokenize/hash large amount of strings efficiently

Examples