Changes for version 0.26 - 2014-03-09

  • protection of bookcleaner markup "_sec:..._"
  • PLNbase is now written in UTF-8
  • try to choose setlocale(LC_CTYPE, "pt_PT|pt_BR|???") to be less dependent of present LC_CTYPE.
  • fix a bug related with xml tags tokenizing in "cqptokens"
  • incorporated lots of new rules for syllable division (thanks to João Machado, from P-Pal project)

Documentation

Command line tool for text segmentation, tokenization and annotation

Modules

Perl extension for NLP of the Portuguese