Changes for version 0.26 - 2014-03-09
- protection of bookcleaner markup "_sec:..._"
- PLNbase is now written in UTF-8
- try to choose setlocale(LC_CTYPE, "pt_PT|pt_BR|???") to be less dependent of present LC_CTYPE.
- fix a bug related with xml tags tokenizing in "cqptokens"
- incorporated lots of new rules for syllable division (thanks to João Machado, from P-Pal project)
Documentation
Command line tool for text segmentation, tokenization and annotation
Modules
Perl extension for NLP of the Portuguese