Changes for version v0.12.012 - 2020-01-27

  • added DTA::CAB-style "find.hack" to avoid symlink-related ExtUtils::MakeMaker pancakes
  • added new class DiaColloDB::Corpus::Compiled - pre-compiled, pre-filtered corpora (JSON)
    • added dcdb-corpus-compile.perl for corpus creation & union
    • DiaColloDB::create() method now implicitly compiles temporary corpus if required
    • corpus document parsing in parallel using threads module
  • added DiaColloDB::XS sub-module - fast XS/C++ implementations for compile-time operations
    • requires OpenMP, only tested on linux/gcc, pure-perl fallbacks still in place
  • added global $DiaColloDB::NJOBS - number of parallel compile-time worker threads
    • default=-1 uses all available cores, via DiaColloDB::Utils::nJobs()
  • factored out corpus content filters (stop/go-lists & -regexes) to new module DiaColloDB::Corpus::Filters
    • use Exporter for backwards-compatibility
  • factored out DiaColloDB compile+create and import+export methods to DiaColloDB/methods/(compile|export).pm
    • use (sort TMPFILE|cut) instead of (cut|sort -) for frequency filters in methods/compile.pm
  • added APPEND option to tmparray(), tmphash() - mostly useful for debugging
  • added DiaColloDB::Document::Storable subclass
    • not really used: JSON is almost as fast, substantially smaller, and more portable
  • use threads instead of forks in Client::list (requires DDC::Any via DDC::PP or DDC::XS >= v0.23)
    • renamed sentinel variable HAVE_FORKS->HAVE_THREADS
  • use temporary sort-file in Relation::TDF::create (parallel sort)
  • added thread- and XS-related options for dcdb-create.perl, dcdb-corpus-compile.perl
    • "-jobs=NJOBS" : number of parallel jobs
    • "-xs" / "-pp" : do/don't use XS implementations

Modules

diachronic collocation database, top-level
diachronic collocation db, top-level client API
diachronic collocation db: client: local dbdir
diachronic collocation db: client: remote http server
diachronic collocation db: client: distributed
DiaColloDB utilities: compatibility modules: top-level wrappers
DiaColloDB utilities: compatibility modules: v0.08.x
diachronic collocation db, integer-integer* multimap file, backwards-compatible (v0.08.x)
DiaColloDB utilities: compatibility modules: v0.09.x
collocation db, top-level: backwards-compatible (v0.09.x)
collocation db, relation API: backwards-compatible (v0.09.x)
collocation db, profiling relation: co-frequency database (v0.9x)
collocation db, profiling relation: unigram database (v0.09.x)
DiaColloDB utilities: compatibility modules: v0.11.x
collocation db, profiling relation: co-frequency database (using pair of DiaColloDB::PackedFile)
collocation db, profiling relation: co-occurence frequencies via (term x document) raw-frequency matrix + formerly DiaColloDB::Relation::Vsem.pm ("vector-space distributional semantic index")
collocation db, profiling relation: unigram database (using DiaColloDB::PackedFile)
diachronic collocation db, source corpous
collocation db, source corpus (pre-compiled)
collocation db, source corpus content filters
diachronic collocation db, source document (base class)
diachronic collocation db, source document, DDC tab-dump
diachronic collocation db, source document, raw JSON
diachronic collocation db, source document, Storable
diachronic collocation db, source document, TCF format
diachronic collocation db, source document, TEI format
diachronic collocation db, symbollt-gtinteger enum
diachronic collocation db, symbollt-gtinteger enum, fixed-length symbols
diachronic collocation db, symbollt-gtinteger enum, fixed-length symbols, mmap
diachronic collocation db, symbollt-gtinteger enum, mmap
diachronic collocation db: symbollt-gtinteger enum: tied interface
DiaColloDB logging (using Log::Log4perl)
diachronic collocation db, integer->integer* multimap file, e.g. for expansion indices
collocation db, integer->integer* multimap file, using mmap
DiaColloDB utilities: (temporary) mmaped PDLs
diachronic collocation db: flat fixed-length record-oriented files
collocation db: flat fixed-length record-oriented files; mmap variant
diachronic collocation db, generic persistent objects
diachronic collocation db, (co-)frequency profile
diachronic collocation db, diff profiles
diachronic collocation db, (co-)frequency profile, by date-slice
diachronic collocation db, (co-)frequency profile diffs, by date
diachronic collocation db, relation API (abstract & utilities)
diachronic collocation db, profiling relation: native fixed-window co-frequency index
diachronic collocation db, profiling relation: ddc client
collocation db, profiling relation: (term x document) raw-frequency matrix
collocation db, profiling relation: PDL: query hacks
diachronic collocation db, profiling relation: native unigram index
Descript: DiaColloDB: temporary data structures: common base class
DiaColloDB: temporary arrays
DiaColloDB: temporary hashes
DiaColloDB: temporary mmaped vec() buffers
diachronic collocation db, timer
DiaColloDB utilities: auto-magic upgrades: top level
DiaColloDB utilities: auto-magic upgrade: base class / API
DiaColloDB utilities: auto-magic upgrade: v0.04: date limits
DiaColloDB utilities: auto-magic upgrade: v0.09.x: MultiMapFile format
DiaColloDB utilities: auto-magic upgrade: v0.10.x: x-tuples (+date) to t-tuples (-date)
DiaColloDB utilities: auto-magic upgrade: v0.11.x -> v0.12.x: allow slice-wise N
diachronic collocation database, generic utilities
compile-time methods for DiaColloDB
import/export methods for DiaColloDB
XS utilities for DiaColloDB
XS/C++ utilities for Cofreqs relation compilation

Provides

in DiaColloDB/methods/compile.pm
in DiaColloDB/methods/export.pm
in DiaColloDB/Relation/Cofreqs.pm
in DiaColloDB/Compat/v0_09/Relation/Cofreqs.pm
in DiaColloDB/Compat/v0_09/Relation/Unigrams.pm
in DiaColloDB/Relation/DDC.pm
in DiaColloDB/EnumFile/Tied.pm
in DiaColloDB/EnumFile/FixedMap.pm
in DiaColloDB/EnumFile/Tied.pm
in DiaColloDB/EnumFile/Tied.pm
in DiaColloDB/Profile/MultiDiff.pm
in DiaColloDB/Profile/MultiDiff.pm
in DiaColloDB/Relation/Unigrams.pm