NAME
DiaColloDB::Relation::Unigrams - diachronic collocation db, profiling relation: native unigram index
ALIASES
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Relation::Unigrams;
##========================================================================
## Constructors etc.
$ug = $CLASS_OR_OBJECT->new(%args);
##========================================================================
## API: disk usage
@files = $obj->diskFiles();
##========================================================================
## I/O: open/close
$ug_or_undef = $ug->open($base,$flags);
$ug_or_undef = $ug->close();
$bool = $ug->opened();
##========================================================================
## I/O: header
@keys = $ug->headerKeys();
$bool = $ug->loadHeaderData($hdr);
##========================================================================
## I/O: text
$ug = $ug->loadTextFh($fh,%opts)
$ug = $ug->saveTextFh($fh,%opts);
##========================================================================
## Relation API: creation
$ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
$ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
##========================================================================
## Relation API: default
\%slice2prf = $rel->subprofile1(\@tids,\%opts);
\%qinfo = $rel->qinfo($coldb, %opts);DESCRIPTION
DiaColloDB::Relation::Unigrams is a DiaColloDB::Relation subclass for native indices over attribute-tuple unigrams using the DiaColloDB::PackedFile API for low-level index data.
Globals & Constants
- Variable: @ISA
- 
DiaColloDB::Relation::Unigrams inherits from DiaColloDB::Relation. 
Constructors etc.
- new
- 
$ug = $CLASS_OR_OBJECT->new(%args);%args, object structure: ##-- user options base => $basename, ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr" flags => $flags, ##-- fcntl flags or open-mode (default='r') perms => $perms, ##-- creation permissions (default=(0666 &~umask)) pack_i => $pack_i, ##-- pack-template for IDs (default='N') pack_f => $pack_f, ##-- pack-template for frequencies (default='N') pack_d => $pack_d, ##-- pack-tempalte for dates (default='n') keeptmp => $bool, ##-- keep temporary files? (default=false) logCompat => $level, ##-- log-level for compatibility warnings (default='warn') ## ##-- size info (after open() or load()) size1 => $size1, ##-- == $r1->size() size2 => $size2, ##-- == $r2->size() ## ##-- low-level data r1 => $r1, ##-- pf: [$end2] @ $i1 : constant (logical index) r2 => $r2, ##-- pf: [$d1,$f1]* @ end2($i1-1)..(end2($i1+1)-1) : sorted by $d1 for each $i1 N => $N, ##-- sum($f1) version => $version, ##-- file version, for compatibility checks
- DESTROY
- 
destructor implicitly calls close(). 
API: disk usage
- diskFiles
- 
@files = $obj->diskFiles();returns disk storage files, used by du() and timestamp() 
I/O: open/close
- open
- 
$ug_or_undef = $ug->open($base,$flags); $ug_or_undef = $ug->open($base); $ug_or_undef = $ug->open();Opens underlying index files. 
- close
- 
$ug_or_undef = $ug->close();Closes underlying index files. Implicitly calls flush() if index is opened for writing. 
- opened
- 
$bool = $ug->opened();Returns true iff index is opened. 
I/O: header
- headerKeys
- 
@keys = $ug->headerKeys();keys to save as header 
- loadHeaderData
- 
$bool = $ug->loadHeaderData($hdr);instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation. 
I/O: text
- loadTextFh
- 
$ug = $ug->loadTextFh($fh,%opts);- loads from text file as saved by saveTextFh(). 
- input fh must be sorted numerically by - ($i1,$d1).
- supports multiple lines for pairs - ($i1,$d1)provided the above condition(s) hold.
- supports loading of - $ug->{N}from single-component lines.
- %opts: clobber %$ug 
 
- saveTextFh
- 
$bool = $ug->saveTextFh($fh,%opts);save as text with lines of the form: N ##-- 1 field : N FREQ ID1 DATE ##-- 3 fields: unigram frequency for (ID1,DATE)%opts: i2s => \&CODE, ##-- code-ref for formatting indices; called as $s=CODE($i)
Relation API: creation
- create
- 
$ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);populates unigram database from $tokdat_file, a tt-style text file with lines of the form: TID DATE ##-- single token "\n" ##-- blank line ~ EOS (hard co-occurrence boundary)%opts: clobber %$ug 
- union
- 
$ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);merge multiple unigram indices into new object. @pairsis an array of pairs([$argug,\@ti2u],...)of unigram relations$argugand tuple-id maps\@ti2ufor$argug. implicitly flushes the new index.%opts: clobber %$ug 
Relation API: default
- subprofile1
- 
\%slice2prf = $ug->subprofile1(\@tids,\%opts);Get slice-wise unigram profile(s) for tuple-IDs @tids.$ugmust be opened. %opts: as for DiaColloDB::Relation::subprofile1().
- subextend
- 
\%slice2prf = $rel->subextend(\%slice2prf,\%opts);Populate independent collocate frequencies in %slice2prfvalues. Override just returns a new empty DiaColloDB::Profile::Multi object.
- qinfo
- 
\%qinfo = $rel->qinfo($coldb, %opts);get query-info hash for profile administrivia (ddc hit links) %opts: as for profile(), additionally: qreqs => \@qreqs, ##-- as returned by $coldb->parseRequest($opts{query}) gbreq => \%groupby, ##-- as returned by $coldb->groupby($opts{groupby})
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...