Changes for version 0.06
- some changes to handle Unicode more or less properly: normalization, unicode classes in regular expressions
 - speed optimizations
 - synced algorithm with current PHP version
 - changed tests to use empirically found threshold
 - data update
 
Documentation
download newer data for tokenizer    
  Modules
tokenizer for OpenCorpora project    
  
represents a data file    
  
download newer data for tokenizer    
  
represents a file with vectors