Modules

abstract ancestor for parallel-corpora document readers
abstract ancestor for parallel-corpora document readers
segment text on new lines
language independent rule based tokenizer
Base tokenizer, splits on whitespaces, fills no_space_after
Rule based pseudo language-independent sentence segmenter
collection of blocks parametrized by language and language independent