NAME
DiaColloDB - diachronic collocation database, top-level
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB;
##========================================================================
## Constructors etc.
$coldb = CLASS_OR_OBJECT->new(%args);
##========================================================================
## I/O: open/close
$coldb_or_undef = $coldb->open($dbdir,%opts);
@dbkeys = $coldb->dbkeys();
$coldb_or_undef = $coldb->close();
$bool = $coldb->opened();
@files = $obj->diskFiles();
##========================================================================
## create: utils
$multimap = $coldb->create_multimapmap($base, \%ts2i, $packfmt, $label="multimap");
\@attrs = $coldb->attrs();
$atitle = $CLASS_OR_OBJECT->attrTitle($attr_or_alias);
$acbexpr = $CLASS_OR_OBJECT->attrCountBy($attr_or_alias,$matchid=0);
$aquery_or_filter_or_undef = $CLASS_OR_OBJECT->attrQuery($attr_or_alias,$cquery);
\@attrdata = $coldb->attrData();
$bool = $coldb->hasAttr($attr);
##========================================================================
## create: from corpus
$bool = $coldb->create($corpus,%opts);
##========================================================================
## create: union (aka merge)
$coldb = $CLASS_OR_OBJECT->union(\@coldbs_or_dbdirs,%opts);
##========================================================================
## I/O: header
@keys = $coldb->headerKeys();
$bool = $coldb->loadHeaderData();
##========================================================================
## Export/Import
$bool = $coldb->dbexport();
$coldb = $coldb->dbimport();
##========================================================================
## Info
\%info = $coldb->dbinfo();
##========================================================================
## Profiling: Utils
$relname = $coldb->relname($rel);
$obj_or_undef = $coldb->relation($rel);
\@ids = $coldb->enumIds($enum,$req,%opts);
($dfilter,$sliceLo,$sliceHi,$dateLo,$dateHi)
= $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0);
$compiler = $coldb->qcompiler();
$cquery_or_undef = $coldb->qparse($ddc_query_string);
$cquery = $coldb->parseQuery([[$attr1,$val1],...], %opts) ##-- compat: ARRAY-of-ARRAYs
\@aqs = $coldb->queryAttributes($cquery,%opts);
\@aqs = $coldb->parseRequest($request, %opts);
\%groupby = $coldb->groupby($groupby_request, %opts);
$cqfilter = $coldb->query2filter($attr,$cquery,%opts);
($CQCountKeyExprs,\$CQRestrict,\@CQFilters)
= $coldb->parseGroupBy($groupby_string_or_request,%opts);
##========================================================================
## Profiling: Generic
$mprf = $coldb->profile($relation, %opts);
$mprf = $coldb->extend($relation,%opts);
\%opts = $CLASS_OR_OBJECT->profileOptions(\%opts);
##========================================================================
## Profiling: Comparison (diff)
$mprf = $coldb->compare($relation, %opts);
\%opts = $CLASS_OR_OBJECT->compareOptions(\%opts);
DESCRIPTION
The DiaColloDB package is the top-level module for the DiaColloDB diachronic collocation database package. As a Perl class, a DiaColloDB object can be used to create or query a local native database instance.
Globals & Constants
- Variable: $VERSION
-
Package version.
- Variable: @ISA
-
DiaColloDB inherits from DiaColloDB::Client, and provides the low-level basis for the DiaColloDB::Client API.
- Variable: $PGOOD_DEFAULT
-
Default positive pos regex for document parsing -- don't use
qr//
here, since Storable doesn't like pre-compiled Regexps. Default =q/^(?:N|TRUNC|VV|ADJ)/
. - Variable: $PBAD_DEFAULT
-
Default negative pos regex for document parsing. Default = undef (none).
- Variable: $WGOOD_DEFAULT
-
Default positive word regex for document parsing. Default =
q/[[:alpha:]]/
- Variable: $WBAD_DEFAULT
-
Default negative word regex for document parsing. Default =
q/[\.]/
. - Variable: $LGOOD_DEFAULT
-
Default positive lemma regex for document parsing. Default = undef (none).
- Variable: $LBAD_DEFAULT
-
Default negative lemma regex for document parsing. Default = undef (none).
- Variable: $TDF_MGOOD_DEFAULT
-
Default positive meta-field regex for document parsing (tdf only). Default =
q/^(?:author|pnd|title|basename|collection|flags|textClass|genre)$/
. - Variable: $TDF_MBAD_DEFAULT
-
Fefault negative meta-field regex for document parsing (tdf only). Default =
q/_$/
. - Variable: $ECLASS
-
enum class; default 'DiaColloDB::EnumFile::MMap'. Default = 'DiaColloDB::EnumFile::MMap'.
- Variable: $XECLASS
-
fixed-length enum class. Default = 'DiaColloDB::EnumFile::FixedLen'
- Variable: $MMCLASS
-
multimap class. Default = 'DiaColloDB::MultiMapFile'
- Variable: %TDF_OPTS
-
Default options for DiaColloDB::Relation::TDF->new(). Default:
mgood => $TDF_MGOOD_DEFAULT, ##-- positive filter regex for metadata attributes mbad => $TDF_MBAD_DEFAULT, ##-- negative filter regex for metadata attributes ## minFreq=>undef, ##-- minimum total term-frequency for model inclusion (default=from $coldb->{tfmin}) minDocFreq=>4, ##-- minimim "doc-frequency" (#/docs per term) for model inclusion minDocSize=>4, ##-- minimum doc size (#/tokens per doc) for model inclusion (default=8; formerly $coldb->{vbnmin}) maxDocSize=>'inf', ##-- maximum doc size (#/tokens per doc) for model inclusion (default=inf; formerly $coldb->{vbnmax}) ## vtype=>'float', ##-- store compiled values as 32-bit floats itype=>'long', ##-- store compiled indices as 32-bit integers
Constructors etc.
- new
-
$coldb = CLASS_OR_OBJECT->new(%args);
%args, object structure:
( ##-- options dbdir => $dbdir, ##-- database directory; REQUIRED flags => $fcflags, ##-- fcntl flags or open()-style mode string; default='r' attrs => \@attrs, ##-- index attributes (input as space-separated or array; compiled to array); default=undef (==>['l']) ## + each attribute can be token-attribute qw(w p l) or a document metadata attribute "doc.ATTR" ## + document "date" attribute is always indexed info => \%info, ##-- additional data to return in info() method (e.g. collection, maintainer) pack_id => $fmt, ##-- pack-format for IDs (default='N') pack_f => $fmt, ##-- pack-format for frequencies (default='N') pack_date => $fmt, ##-- pack-format for dates (default='n') pack_off => $fmt, ##-- pack-format for file offsets (default='N') pack_len => $len, ##-- pack-format for string lengths (default='n') dmax => $dmax, ##-- maximum distance for collocation-frequencies and implicit ddc near() queries (default=5) cfmin => $cfmin, ##-- minimum co-occurrence frequency for Cofreqs and ddc queries (default=2) tfmin => $tfmin, ##-- minimum global term-frequency WITHOUT date component (default=2) fmin_${a} => $fmin, ##-- minimum independent frequency for value of attribute ${a} (default=undef:from $tfmin) keeptmp => $bool, ##-- keep temporary files? (default=0) index_xf => $bool, ##-- xf: create/use unigram index (default=1) index_cof => $bool, ##-- cof: create/use co-frequency index (default=1) index_tdf => $bool, ##-- tdf: create/use (term x document) frequency matrix index? (default=undef: if available) dbreak => $dbreak, ##-- tdf: use break-type $break for tdf index (default=undef: files) tdfopts => \%tdfopts, ##-- tdf: options for DiaColloDB::Relation::TDF->new(); default=undef (all inherited from %TDF_OPTS) ## ##-- runtime ddc relation options ddcServer => $server, ##-- server for ddc relation ("$host:$port") ddcTimeout => $secs, ##-- timeout for ddc relation ## ##-- source filtering (for create()) pgood => $regex, ##-- positive filter regex for part-of-speech tags pbad => $regex, ##-- negative filter regex for part-of-speech tags wgood => $regex, ##-- positive filter regex for word text wbad => $regex, ##-- negative filter regex for word text lgood => $regex, ##-- positive filter regex for lemma text lbad => $regex, ##-- negative filter regex for lemma text ## ##-- logging logOpen => $level, ##-- log-level for open/close (default='info') logCreate => $level, ##-- log-level for create messages (default='info') logCorpusFile => $level, ##-- log-level for corpus file-parsing (default='trace') logCorpusFileN => $N, ##-- log corpus file-parsing only for every N files (0 for none; default:undef ~ $corpus->size()/100) logExport => $level, ##-- log-level for export messages (default='info') logProfile => $level, ##-- log-level for verbose profiling messages (default='trace') logRequest => $level, ##-- log-level for request-level profiling messages (default='debug') logCompat => $level, ##-- log-level for compatibility warnings (default='warn') ## ##-- runtime limits maxExpand => $size, ##-- maximum number of elements in query expansions (default=65535) ## ##-- administrivia version => $version, ##-- DiaColloDB version of stored db (==$DiaColloDB::VERSION) upgraded=>\@upgraded, ##-- optional administrative information about auto-magic upgrades ## ##-- attribute data ${a}enum => $aenum, ##-- attribute enum: $aenum : ($dbdir/${a}_enum.*) : $astr<=>$ai : A*<=>N ## e.g. lemmata: $lenum : ($dbdir/l_enum.* ) : $lstr<=>$li : A*<=>N ${a}2t => $a2t, ##-- attribute multimap: $a2t : ($dbdir/${a}_2t.*) : $ai=>@tis : N=>N* pack_t$a => $fmt ##-- pack format: extract attribute-id $ai from a packed tuple-string $ts ; $ai=unpack($coldb->{"pack_x$a"},$ts) ## ##-- tuple data (-dates) ## + as of v0.10.000, packed term tuples EXCLUDING dates ("t-tuples") are mapped by $coldb->{tenum} ## + prior to v0.10.000, term tuples INCLUDING dates ("x-tuples") were mapped by $coldb->{xenum}, now obsolete tenum => $tenum, ##-- enum: tuples ($dbdir/tenum.*) : \@ais<=>$ti : N*<=>N pack_t => $fmt, ##-- symbol pack-format for $tenum : "${pack_id}[Nattrs]" xenum => $xenum, ##-- enum: tuples ($dbdir/xenum.*) : [@ais,$di]<=>$xi : N*n<=>N pack_t => $fmt, ##-- symbol pack-format for $tenum : "${pack_id}[Nattrs]" xdmin => $xdmin, ##-- minimum date (>= v0.04) xdmax => $xdmax, ##-- maximum date (>= v0.04) ## ##-- relation data xf => $xf, ##-- ug: [$ti, $date] => f($ti, $date) cof => $cof, ##-- cf: [$ti1,$date,$ti2] => f($ti1,$date,$ti2) ddc => $ddc, ##-- ddc client relation tdf => $tdf, ##-- tdf: (term x document) frequency matrix relation )
- promote
-
$cli_or_undef = $cli->promote($class,%opts);
DiaColloDB::Client method override: unsupported.
I/O: open/close
- open
-
$coldb_or_undef = $coldb->open($dbdir,%opts); $coldb_or_undef = $coldb->open();
Open the DB.
- dbkeys
-
@dbkeys = $coldb->dbkeys();
Returns list of %$coldb keys whose values are expected to be sub-objects.
- close
-
$coldb_or_undef = $coldb->close();
Close current DB, if opened.
- opened
-
$bool = $coldb->opened();
Returns truee iff db is opened.
- diskFiles
-
@files = $coldb->diskFiles();
Returns list of dist files for $coldb.
create: utils
- Variables: (%ATTR_ALIAS,%ATTR_RALIAS,%ATTR_TITLE,%ATTR_CBEXPR);
-
Global attribute alias hacks.
%ATTR_ALIAS = ($name_or_alias=>$name, ...) %ATTR_RALIAS = ($name=>\@aliases, ...) %ATTR_CBEXPR = ($name=>$ddcCountByExpr, ...) %ATTR_TITLE = ($name_or_alias=>$title, ...)
- create_multimap
-
$multimap = $coldb->create_multimap($base, \%ts2i, $packfmt, $label="multimap");
Create an expansion multimap, used by create().
- attrs
-
\@attrs = $coldb->attrs(); \@attrs = $coldb->attrs($attrs=$coldb-E<gt>{attrs}, $default=[]);
parse attributes in $attrs as array.
- attrName
-
$aname = $CLASS_OR_OBJECT->attrName($attr)
Returns canonical (short) attribute name for $attr. Supports aliases in %ATTR_ALIAS = ($alias=>$name, ...).
- attrTitle
-
$atitle = $CLASS_OR_OBJECT->attrTitle($attr_or_alias);
Returns an attribute title for $attr_or_alias
- attrCountBy
-
$acbexpr = $CLASS_OR_OBJECT->attrCountBy($attr_or_alias,$matchid=0);
Returns a DDC::XS:CQCountKeyExpr object for $attr_or_alias with match-id $matchid.
- attrQuery
-
$aquery_or_filter_or_undef = $CLASS_OR_OBJECT->attrQuery($attr_or_alias,$cquery);
returns a DDC::XS::CQuery or DDC::XS::CQFilter object for condition $cquery on $attr_or_alias.
- attrData
-
\@attrdata = $coldb->attrData(); \@attrdata = $coldb->attrData(\@attrs=$coldb->attrs)
get attribute data for \@attrs; returns @attrdata = ({a=>$a, i=>$i, enum=>$aenum, pack_x=>$pack_xa, a2x=>$a2x, ...})
- hasAttr
-
$bool = $coldb->hasAttr($attr);
Returns true iff $coldb natively supports the attribute (or alias) $attr.
create: from corpus
- create
-
$bool = $coldb->create($corpus,%opts);
%opts:
$key => $val, ##-- clobbers $coldb->{$key}
create: union (aka merge)
- union
-
$coldb = $CLASS_OR_OBJECT->union(\@coldbs_or_dbdirs,%opts);
Populates $coldb as union over @coldbs_or_dbdirs. Clobbers argument dbs {_union_${a}i2u}, {_union_xi2u}, {_union_argi}
I/O: header
Largely inherited from DiaColloDB::Persistent.
- headerKeys
-
@keys = $coldb->headerKeys();
keys to save as header
- loadHeaderData
-
$bool = $coldb->loadHeaderData(); $bool = $coldb->loadHeaderData($data)
loads header data.
Export/Import
- dbexport
-
$bool = $coldb->dbexport(); $bool = $coldb->dbexport($outdir,%opts);
$outdir defaults to "$coldb->{dbdir}/export" %opts:
export_sdat => $bool, ##-- whether to export *.sdat (stringified tuple files for debugging; default=0) export_cof => $bool, ##-- do/don't export cof.* (default=do)
- dbimport
-
$coldb = $coldb->dbimport(); $coldb = $coldb->dbimport($txtdir,%opts)
Import ColocDB data from $txtdir
TODO
Info
- dbinfo
-
\%info = $coldb->dbinfo();
get db info
Profiling: Utils
- relname
-
$relname = $coldb->relname($rel);
Returns an appropriate relation name for profile() and friends:
returns $rel if $coldb->{$rel} supports a profile() method
otherwise heuristically parses $relationName /xf|f?1|ug/ or /f1?2|c/
- relation
-
$obj_or_undef = $coldb->relation($rel);
returns an appropriate relation-like object for profile() and friends; really just wraps
$coldb->{$coldb->relname($rel)}
. - relations
-
@relnames = $coldb->relations();
gets list of relation names supported by $coldb.
- enumIds
-
\@ids = $coldb->enumIds($enum,$req,%opts);
parses enum IDs for $req, which is one of:
a DDC::XS::CQTokExact, ::CQTokInfl, ::CQTokSet, ::CQTokSetInfl, or ::CQTokRegex : interpreted
an ARRAY-ref : list of literal symbol-values
a Regexp ref : regexp for target strings, passed to $enum->re2i()
a string /REGEX/ : regexp for target strings, passed to $enum->re2i()
another string : space-, comma-, or
|
-separated list of literal values
%opts:
logLevel => $logLevel, ##-- logging level (default=undef) logPrefix => $prefix, ##-- logging prefix (default="enumIds(): fetch ids")
- parseDateRequest
-
($dfilter,$sliceLo,$sliceHi,$dateLo,$dateHi) = $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0); \%dateRequest = $coldb->parseDateRequest($dateRequest='', $sliceRequest=0, $fill=0, $ddcMode=0);
low-level parsing for date (slice) requests. Returns limit and filter information as a list if called in list context (first form) or as a HASH-ref
\%dateRequest
if called in scalar context (second form). Returned\%dateRequest
has keys corresponding to the list-elements returned in list context:dfilter => $dfilter, ##-- filter-sub, called as: $wanted=$dfilter->($date); undef for none slo => $sliceLo, ##-- minimum slice (inclusive) shi => $sliceHi, ##-- maximum slice (inclusive) dlo => $dateLo, ##-- minimum date (inclusive); undef for none, always defined if $fill is true dhi => $dateHi, ##-- maximum date (inclusive); undef for none, always defined if $fill is true
Accepted formats for input parameter
$dateRequest
:- Empty Date
-
An empty string or a string containing only whitespace and asterisk (
*
) characters is ignored ($dlo=$dhi=undef
); this should be interepreted by the caller as requesting the full indexed date range. - Date Regex
-
A date request
/REGEX/
enclosed in slashes is treated as a regular expression matching all and only the desired dates. Throws an error if$ddcMode
is true, since DDC currently does not support date regexes. - Date Range
-
A date request of the form
MIN:MAX
matches all dates in the range [MIN..MAX] (inclusive). For convenience, either or both of MIN and MAX may be an asterisk (*
) to indicate the minimum (rsp. maximum) date stored in the index. - Date List
-
A whitespace-, comma-, or
|
-separated list of values is treated as a literal list of target dates. Throws an error if$ddcMode
is true. - Date Value
-
Any other value is treated as a literal single target date.
- qcompiler
-
$compiler = $coldb->qcompiler();
get DDC::XS::CQueryCompiler for this object (cached in $coldb->{_qcompiler})
- qparse
-
$cquery_or_undef = $coldb->qparse($ddc_query_string);
wraps parse in an eval {...} block and sets $coldb->{error} on failure
- parseQuery
-
$cquery = $coldb->parseQuery([[$attr1,$val1],...], %opts) ##-- compat: ARRAY-of-ARRAYs; $cquery = $coldb->parseQuery(["$attr1:$val1",...], %opts) ##-- compat: ARRAY-of-requests $cquery = $coldb->parseQuery({$attr1=>$val1, ...}, %opts) ##-- compat: HASH $cquery = $coldb->parseQuery("$attr1=$val1, ...", %opts) ##-- compat: string $cquery = $coldb->parseQuery($ddcQueryString, %opts) ##-- ddc string (with shorthand ","->WITH, "&&"->WITH)
Guts for parsing user target and groupby requests; returns a DDC::XS::CQuery object representing the request. Index-only items "$l" are mapped to $l=*
%opts:
warn => $level, ##-- log-level for unknown attributes (default: 'warn') logas => $reqtype, ##-- request type for warnings default => $attr, ##-- default attribute (for query requests) mapand => $bool, ##-- map CQAnd to CQWith? (default=true unless '&&' occurs in query string) ddcmode => $bool, ##-- force ddc query mode? (default=false)
If the first argument is a reference, it is parsed as a native query request. Otherwise, it is assumed to be a string either in the "native" (backwards-compatible) single-token request-notation or a valid DDC query. If the request looks like a simple request, it is parsed into a DDC::XS::CQuery object using local heuristics; DDC queries are parsed directly. The query syntax for "native" DiaColloDB queries is:
q_native ::= qn_clause ((" "|",") qn_clause)* qn_clause ::= ("$"? qn_attr "=")? qn_value qn_attr ::= STRING qn_value ::= qn_regex | qn_words qn_regex ::= "/" REGEX "/" qn_regmod qn_regmod ::= ("g"|"i"|"m"|"s"|"a"|"l"|"u"|"x")* qn_words ::= qn_word ("|" qn_word)* qn_word ::= STRING
Native request clauses are parsed into queries of type CQTokSet, CQTokExact, CQTokRegex, or CQTokAny, and the returned query object conjoins multiple native request clauses using CQTokWith.
DDC queries are much more flexible, but not all DiaColloDB::Relation types support the full range of the DDC query syntax. In particular, the default relation classes DiaColloDB::Relation::Cofreqs and DiaColloDB::Relation::Unigrams support only those query types accepted by the queryAttributes() method.
- queryAttributes
-
\@aqs = $coldb->queryAttributes($cquery,%opts);
Utility for decomposing DDC queries into attribute-wise requests; returns an ARRAY-ref [[$attr1,$val1], ...]. Each value $vali is empty or undef (all values), a CQTokSet, a CQTokExact, a CQTokRegex, or a CQTokAny. Chokes on unsupported query types or filters.
%opts:
warn => $level, ##-- log-level for unknown attributes (default: 'warn') logas => $reqtype, ##-- request type for warnings default => $attr, ##-- default attribute (for query requests) allowUnknown => $bool, ##-- allow unknown attributes? (default: 0)
- parseRequest
-
\@aqs = $coldb->parseRequest($request, %opts);
Guts for parsing user target and groupby requests into attribute-wise ARRAY-ref
[[$attr1,$val1], ...]
, used by native profiling methods. See parseQuery() method for supported$request
formats and%opts
. Wraps$coldb->queryAttributes($coldb->parseQuery($request,%opts))
. - groupby
-
\%groupby = $coldb->groupby($groupby_request, %opts); \%groupby = $coldb->groupby(\%groupby, %opts);
Parse a user groupby request, used by native profiling methods. See parseRequest() for details on syntax of
$groupby_request
. Unlike "query" request parsing, native query-request attributes are obligatory and values are optional in "groupby" parsing mode:q_groupby ::= qg_clause ((" "|",") qg_clause)* qg_clause ::= "$"? qn_attr ("=" qn_value)?
Returns a HASH-ref of the form:
req => $request, ##-- save request ti2g => \&ti2g, ##-- group-tuple extraction code ($ti => $gtuple) : $g_packed = $ti2g->($ti) ts2g => \&ts2g, ##-- group-tuple extraction code ($ts => $gtuple) : $g_packed = $ts2g->($ts) g2s => \&g2s, ##-- stringification object suitable for DiaColloDB::Profile::stringify() [CODE,enum, or undef] g2txt => \&g2txt, ##-- backwards-compatible join()-string stringifcation sub: join("\t",unpack($pack_g,$g_packed)) tpack => \@tpack, ##-- group-attribute-wise pack-templates, given @ttuple gpack => \@gpack, ##-- group-attribute-wise pack-templates, given @gtuple areqs => \@areqs, ##-- parsed attribute requests ([$attr,$ahaving],...) attrs => \@attrs, ##-- like $coldb->attrs($groupby_request), modulo "having" parts titles => \@titles, ##-- like map {$coldb->attrTitle($_)} @attrs
Options %opts:
warn => $level, ##-- log-level for unknown attributes (default: 'warn') relax => $bool, ##-- allow unsupported attributes (default=0) tenum => $tenum, ##-- enum to use for \&t2g and \&t2s (default: $coldb->{tenum})
- query2filter
-
$cqfilter = $coldb->query2filter($attr,$cquery,%opts);
Converts a CQToken to a CQFilter, for ddc parsing. %opts:
logas => $logas, ##-- log-prefix for warnings
- parseGroupBy
-
($CQCountKeyExprs,\$CQRestrict,\@CQFilters) = $coldb->parseGroupBy($groupby_string_or_request,%opts);
%opts:
date => $date, slice => $slice, matchid => $matchid, ##-- default match-id
ddc-mode groupby parsing utility. In addition to the native groupby syntax supported by the groupby() method, ddc-mode parsing also allows specification of a literal DDC count-ley list by enclosing it in square brackets:
ddc_groupby ::= q_group | ("#BY"? "[" l_countkeys "]")
This is mainly useful in conjunction with user-defined match-ids in the corresponding parsed query, document metadata attributes, and/or server-side regex key transformations; see http://odo.dwds.de/~moocow/software/ddc/ddc_query.html#rule_count_key for details.
Profiling: Generic
- profile
-
$mprf = $coldb->profile($relation, %opts);
Get a relation profile for selected items as a DiaColloDB::Profile::Multi object. %opts:
##-- selection parameters query => $query, ##-- target request ATTR:REQ... date => $date1, ##-- string or array or range "MIN-MAX" (inclusive) : default=all ## ##-- aggregation parameters slice => $slice, ##-- date slice (default=1, 0 for global profile) groupby => $groupby, ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method ## ##-- scoring and trimming parameters eps => $eps, ##-- smoothing constant (default=0) score => $func, ##-- scoring function (f,fm,lf,lfm,mi,ld) : default="f" kbest => $k, ##-- return only $k best collocates per date (slice) : default=-1:all cutoff => $cutoff, ##-- minimum score global => $bool, ##-- trim profiles globally (vs. locally for each date-slice?) (default=0) ## ##-- profiling and debugging parameters strings => $bool, ##-- do/don't stringify (default=do) fill => $bool, ##-- if true, returned multi-profile will have null profiles inserted for missing slices onepass => $bool, ##-- if true, use old, fast, incorrect 1-pass method (default=0)
Sets default %opts and wraps $coldb->relation($rel)->profile($coldb, %opts).
- extend
-
$mprf = $coldb->extend($relation, %opts);
Get independent f2 frequencies for
$opts{slice2keys}
, which is EITHER a HASH-ref{$sliceLabel1=>\@sliceKeys1, ...}
, OR a JSON-string encoding such a HASH-ref. Options%opts
are as for the profile() method (mostly ignored), and also:slice2keys => \%slice2keys, ##-- target f2-items or JSON-string (REQUIRED)
Returns a DiaColloDB::Profile::Multi object containing the appropriate f2 entries. Used by
list-clients|DiaColloDB::Client::list
to ensure correct f2 counts for "missing" collocate items; see "Incorrect Independent Collocate Frequencies" in DiaColloDB::Client::list for details. - profileOptions
-
\%opts = $CLASS_OR_OBJECT->profileOptions(\%opts);
Instantiates default options for profile() method. May be used e.g. by DiaColloDB::Client subclasses.
Profiling: Comparison (diff)
- compare
-
$mprf = $coldb->compare($relation, %opts);
Get a relation comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object. %opts:
##-- selection parameters (a|b)?query => $query, ##-- target query as for parseRequest() (a|b)?date => $date1, ##-- string or array or range "MIN-MAX" (inclusive) : default=all ## ##-- aggregation parameters groupby => $groupby, ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method (a|b)?slice => $slice, ##-- date slice (default=1, 0 for global profile) ## ##-- scoring and trimming parameters eps => $eps, ##-- smoothing constant (default=0) score => $func, ##-- scoring function (f,fm,lf,lfm,mi,ld) : default="f" kbest => $k, ##-- return only $k best collocates per date (slice) : default=-1:all cutoff => $cutoff, ##-- minimum score (UNUSED for comparison profiles) global => $bool, ##-- trim profiles globally (vs. locally for each date-slice?) (default=0) diff => $diff, ##-- low-level score-diff operation (diff|adiff|sum|min|max|avg|havg|gavg|lavg); default='adiff' ## ##-- profiling and debugging parameters strings => $bool, ##-- do/don't stringify (default=do)
Sets default %opts and wraps $coldb->relation($rel)->compare($coldb, %opts)
- compareOptions
-
\%opts = $CLASS_OR_OBJECT->compareOptions(\%opts);
Instantiates default options for compare() method. May be used e.g. by DiaColloDB::Client subclasses.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2016 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Client(3pm), DiaColloDB::Corpus(3pm), DiaColloDB::Document(3pm), DiaColloDB::Persistent(3pm), DiaColloDB::Profile(3pm), DiaColloDB::Relation(3pm), DiaColloDB::Temp(3pm), DiaColloDB::Utils(3pm), dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB::WWW(3pm), perl(1), ...