NAME
DiaColloDB::Relation::Cofreqs - diachronic collocation db, profiling relation: native fixed-window co-frequency index
ALIASES
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Relation::Cofreqs;
##========================================================================
## Constructors etc.
$cof = $CLASS_OR_OBJECT->new(%args);
##========================================================================
## I/O: open/close
$cof_or_undef = $cof->open($base,$flags);
$cof_or_undef = $cof->close();
$bool = $cof->opened();
##========================================================================
## I/O: header
@keys = $cof->headerKeys();
$bool = $cof->loadHeaderData($hdr);
##========================================================================
## I/O: text
$cof = $cof->loadTextFh($fh,%opts)
$cof = $cof->loadTextFile_create($fh,%opts);
$bool = $cof->saveTextFh($fh,%opts);
##========================================================================
## Relation API: create
$rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
##========================================================================
## Relation API: union
$cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
##========================================================================
## Utilities: lookup
$f = $cof->f1( @ids);
$f12 = $cof->f12($id1,$id2);
##========================================================================
## Relation API: default: profiling
$prf = $cof->subprofile(\@xids, %opts);
##========================================================================
## Relation API: default: query info
\%qinfo = $rel->qinfo($coldb, %opts);
DESCRIPTION
DiaColloDB::Relation::Cofreqs is a DiaColloDB::Relation subclass for native indices over collocation frequencies within a fixed-length window of context words using a pair of DiaColloDB::PackedFile objects for low-level index data.
Only simple queries expressed as a disjunction of single-term conditions (i.e. those queries which evaluate to a set of term-tuples) are supported. Likewise, only groupby
conditions over literal indexed term-attributes are supported.
Globals & Constants
- Variable: @ISA
-
DiaColloDB::Relation::Cofreqs inherits from DiaColloDB::Relation.
Constructors etc.
- new
-
$cof = CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- user options class => $class, ##-- optional, useful for debugging from header file base => $basename, ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr" flags => $flags, ##-- fcntl flags or open-mode (default='r') perms => $perms, ##-- creation permissions (default=(0666 &~umask)) dmax => $dmax, ##-- maximum distance for co-occurrences (default=5) fmin => $fmin, ##-- minimum pair frequency (default=0) pack_i => $pack_i, ##-- pack-template for IDs (default='N') pack_f => $pack_f, ##-- pack-template for IDs (default='N') keeptmp => $bool, ##-- keep temporary files? (default=false) ## ##-- size info (after open() or load()) size1 => $size1, ##-- == $r1->size() size2 => $size2, ##-- == $r2->size() ## ##-- low-level data r1 => $r1, ##-- pf: [$end2,$f1] @ $i1 r2 => $r2, ##-- pf: [$i2,$f12] @ end2($i1-1)..(end2($i1)-1) N => $N, ##-- sum($f12)
- DESTROY
-
Destructor implicitly calls close().
I/O: open/close
- open
-
$cof_or_undef = $cof->open($base,$flags); $cof_or_undef = $cof->open($base) $cof_or_undef = $cof->open()
Opens underlying index files.
- close
-
$cof_or_undef = $cof->close();
Closes underlying index files. Implicitly calls flush() if index is opened for writing.
- opened
-
$bool = $cof->opened();
Returns true iff index is opened.
I/O: header
See also DiaColloDB::Persistent.
- headerKeys
-
@keys = $cof->headerKeys();
keys to save as header
- loadHeaderData
-
$bool = $cof->loadHeaderData($hdr);
instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.
I/O: text
- loadTextFh
-
$cof = $cof->loadTextFh($fh,%opts)
loads from text file as saved by saveTextFh()
supports semi-sorted input: input fh must be sorted by $i1, and all $i2 for each $i1 must be adjacent (i.e. no intervening $j1 != $i1)
supports multiple lines for pairs ($i1,$i2) provided the above conditions hold
supports loading of $cof->{N} from single-value lines
%opts: clobber %$cof
- loadTextFile_create
-
$cof = $cof->loadTextFile_create($fh,%opts);
old, slightly faster version of loadTextFile() which doesn't support {N}, semi-sorted input, or multiple ($i1,$i2) entries; not useable by union() method.
- saveTextFh
-
$bool = $cof->saveTextFh($fh,%opts);
save from text file with initial line "N" and subsequent lines of the form:
FREQ ID1 ID2
%opts:
i2s => \&CODE, ##-- code-ref for formatting indices; called as $s=CODE($i)
Relation API: create
- create
-
$rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
populates current index from $tokdat_file, a tt-style text file containing 1 token-id perl line with optional blank lines.
%opts: clobber %$rel, also:
size=>$size, ##-- set initial size (number of types)
Relation API: union
- union
-
$cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
merge multiple co-frequency indices from \@pairs into new object. @pairs is an array of pairs ([$cof,\@xi2u],...) of DiaColloDB::Relation::Cofreqs objects $cof and tuple-id maps \@xi2u for $cof; \@xi2u may also be a mapping object supporting a toArray() method. implicitly flushes the new index.
%opts: clobber %$cof
Utilities: lookup
- f1
-
$f = $cof->f1( @ids); $f = $cof->f1(\@ids);
get total marginal unigram frequency (index must be opened)
- f12
-
$f12 = $cof->f12($id1,$id2);
return joint frequency for pair ($id1,$id2)
currently UNUSED
Relation API: default: profiling
- subprofile
-
$prf = $cof->subprofile(\@xids, %opts);
get co-frequency profile for @xids (index must be opened). %opts:
groupby => \&gbsub, ##-- key-extractor $key2_or_undef = $gbsub-E<gt>($i2)
Relation API: default: query info
- qinfo
-
\%qinfo = $rel->qinfo($coldb, %opts);
get query-info hash for profile administrivia (ddc hit links).
%opts: as for profile(), additionally:
qreqs => \@qreqs, ##-- as returned by $coldb->parseRequest($opts{query}) gbreq => \%groupby, ##-- as returned by $coldb->groupby($opts{groupby})
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2016 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...