NAME
DiaColloDB::Relation::DDC - diachronic collocation db, profiling relation: ddc client
ALIASES
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Relation::DDC;
##========================================================================
## Constructors etc.
$ddc = $CLASS_OR_OBJECT->new(%args);
$rel_or_undef = $CLASS_OR_OBJECT->fromDB($coldb,%opts);
##========================================================================
## Relation API: create
$rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
##========================================================================
## Relation API: union (SKETCHY)
$rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
##========================================================================
## Relation API: profile
$mprf = $rel->profile($coldb, %opts);
##========================================================================
## Relation API: compare
$mprf = $rel->compare($coldb, %opts);
##========================================================================
## Utils: profiling
$dclient = $rel->ddcClient(%opts);
$results = $rel->ddcQuery($coldb, $query_or_str, %opts);
$fcoef = $rel->fcoef($cquery);
$qcount = $rel->countQuery($coldb,\%opts);
DESCRIPTION
DiaColloDB::Relation::DDC is a DiaColloDB::Relation subclass using the DDC::Client::Distributed module for acquiring fine-grained collocation frequency profile data from a remote DDC server. It is generally much slower than the native index types DiaColloDB::Relation::Cofreqs and DiaColloDB::Relation::Unigrams, but is much more flexible regarding selection of corpus subsets, collocation targets, and aggregation parameters.
Globals & Constants
- Variable: @ISA
-
DiaColloDB::Relation::DDC inherits from DiaColloDB::Relation.
Constructors etc.
- new
-
$ddc = CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- persistent options base => $basename, ##-- configuration header basename (default=undef) ## ##-- ddc client options ddcServer => "$server:$port", ##-- ddc server (required; default=$coldb->{ddcServer} via fromDB() method) ddcTimeout => $timeout, ##-- ddc timeout; default=120 ddcLimit => $limit, ##-- default limit for ddc queries (default=-1) ddcSample => $sample, ##-- default sample size for ddc queries (default=-1:all) dmax => $maxDistance, ##-- default distance for near() queries (default=5; 1=immediate adjacency; ~ ddc CQNear.Dist+1) cfmin => $minFreq, ##-- default minimum frequency for count() queries (default=2) ## ##-- low-level data dclient => $ddcClient, ##-- a DDC::Client::Distributed object
- fromDB
-
$rel_or_undef = $CLASS_OR_OBJECT->fromDB($coldb,%opts);
default implementation clobbers $rel->headerKeys() from %$coldb, %opts
Relation API: create
- create
-
$rel = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
nothing really interesting happens here; default just calls fromDB() and saveHeaderFile().
Relation API: union (SKETCHY)
- union
-
$rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
merge multiple co-frequency indices into new object. @pairs is an array of pairs ([$ug,\@xi2u],...) of unigram-objects $ug and tuple-id maps \@xi2u for $ug.
%opts: clobber %$rel
default implementation just calls create(), but should probably create a list of ddc servers to query, which isn't supported yet.
Relation API: profile
- profile
-
$mprf = $rel->profile($coldb, %opts);
get a relation profile for selected items as a DiaColloDB::Profile::Multi object. %opts: as for DiaColloDB::Relation::profile(), also:
##-- sampling options limit => $limit, ##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit}) sample => $sample, ##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample}) cfmin => $cfmin, ##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin}) dmax => $dmax, ##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})
Relation API: compare
- compare
-
$mprf = $rel->compare($coldb, %opts);
Get a comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object.
%opts: as for DiaColloDB::Relation::compare(), also:
##-- sampling options (a|b)?limit => $limit, ##-- maximum number of items to return from ddc; sets $qconfig{limit} (default: query "#limit[N]" or $rel->{ddcLimit}) (a|b)?sample => $sample, ##-- ddc sample size; sets $qconfig{qcount} Sample property (default: query "#sample[N]" or $rel->{ddcSample}) (a|b)?cfmin => $cfmin, ##-- minimum subcorpus frequency for returned items (default: query "#fmin[N]" or $rel->{cfmin}) (a|b)?dmax => $dmax, ##-- maxmimum distance for implicit near() queries (default: query "#dmax[N]" or $rel->{dmax})
Utils: profiling
- ddcClient
-
$dclient = $rel->ddcClient(%opts);
returns cached $rel->{dclient} if defined, otherwise creates and caches a new client. chokes if ddcServer is not defined
%opts: clobber %{$rel->{dclient}}
- ddcQuery
-
$results = $rel->ddcQuery($coldb, $query_or_str, %opts);
Returns decoded JSON results for DDC client query $query_or_str, optionally logging the query and tracking errors.
%opts:
logas => $prefix, ##-- log prefix (default: 'ddcQuery()') loglevel => $level, ##-- log level (default=$coldb-E<gt>{logProfile}) limit => $limit, ##-- set result client limit (default: current client limit, or -1 for limit=E<gt>undef)
- fcoef
-
$fcoef = $rel->fcoef($cquery);
Get expected frequency coefficient for the DDC::XS::CQuery object $cquery. Used to estimate total independent marginal frequencies (f1,f2,N) for profile construction. The default implementation should provide reasonable guesses for common query types.
- countQuery
-
$qcount = $rel->countQuery($coldb,\%opts);
creates a DDC::XS::CQCount object for profile() options %opts. sets following keys in %opts:
limit => $limit, ##-- hit return limit for ddc query dslo => $dslo, ##-- minimum date-slice, from @opts{qw(date slice fill)} dshi => $dshi, ##-- maximum date-slice, from @opts{qw(date slice fill)} dlo => $dlo, ##-- minimum date request (ddc) dhi => $dhi, ##-- maximum date request (ddc) fcoef => $fcoef, ##-- frequency coefficient, parsed from "#coef[N]", auto-generated if not set qtemplate => $qtemplate, ##-- query template for ddc hit link-up
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2016 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB(3pm), perl(1), ...