NAME
DiaColloDB::Profile::Diff - diachronic collocation db, diff profiles
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Profile::Diff;
##========================================================================
## Constructors etc.
$prf = $CLASS_OR_OBJECT->new(%args);
$dprf2 = $dprf->clone();
##========================================================================
## Basic Access
($prf1,$prf2) = $dprf->operands();
$bool = $dprf->empty();
##========================================================================
## I/O: JSON
$obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);
##========================================================================
## I/O: Text
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
$bool = $prf->saveTextFh($fh, %opts);
##========================================================================
## I/O: HTML
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
##========================================================================
## Compilation
$dprf = $dprf->populate();
$dprf = $dprf->compile($func,%opts);
$dprf = $dprf->uncompile();
$opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);
$opsub = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);
$bool = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias);
$key = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);
$diff = diffop_diff($ascore,$bscore);
$diff = diffop_sum($ascore,$bscore);
$diff = diffop_min($ascore,$bscore);
$diff = diffop_max($ascore,$bscore);
$diff = diffop_avg($ascore,$bscore);
$diff = diffop_havg($ascore,$bscore);
$diff = diffop_gavg($ascore,$bscore);
$diff = diffop_lavg($ascore,$bscore);
##========================================================================
## Trimming
\@keys = $dprf->which(%opts);
$dprf = $dprf->trim(%opts);
##========================================================================
## Stringification
$dprf = $dprf->stringify( $obj);
##========================================================================
## Binary operations
$dprf = $dprf->_add($dprf2,%opts);
DESCRIPTION
DiaColloDB::Profile::Diff is a DiaColloDB::Profile subclass class for representing low-level collocate frequency comparison data for a single date-slice as arising from the comparison of two DiaColloDB::Profile objects.
Globals & Constants
- @ISA
-
DiaColloDB::Profile::Diff inherits from DiaColloDB::Profile.
- %DIFFOPS
-
Canonical diff-operation names keyed by alias.
Constructors etc.
- new
-
$prf = $CLASS_OR_OBJECT->new(%args); $prf = $CLASS_OR_OBJECT->new($prf1,$prf2,%args)%args, object structure:
##-- DiaColloDB::Profile::Diff prf1 => $prf1, ##-- 1st operand prf2 => $prf2, ##-- 2nd operand diff => $diff, ##-- low-level score-diff binary operation (default='adiff') ##-- DiaColloDB::Profile keys label => $label, ##-- string label (used by Multi; undef for none(default)) #N => $N, ##-- OVERRIDE:unused: total marginal relation frequency #f1 => $f1, ##-- OVERRIDE:unused: total marginal frequency of target word(s) #f2 => \%f2, ##-- OVERRIDE:unused: total marginal frequency of collocates: ($i2=>$f2, ...) #f12 => \%f12, ##-- OVERRIDE:unused: collocation frequencies, %f12 = ($i2=>$f12, ...) ## eps => $eps, ##-- smoothing constant (default=undef: no smoothing) score => $func, ##-- selected scoring function ('f12', 'mi', or 'ld') mi => \%mi12, ##-- DIFFERENCE: score: mutual information * logFreq a la Wortprofil; requires compile_mi() ld => \%ld12, ##-- DIFFERENCE: score: log-dice a la Wortprofil; requires compile_ld() fm => \%fm12, ##-- DIFFERENCE: score: frequency per million; requires compile_fm()The
diffoption selects the function to be used to to compute final scores from operand profiles. The default value is 'adiff'. Currently known values are:adiff # $score=$a-$b # aliases=qw(absolute-difference abs-difference abs-diff adiff adifference a-) ; select=kbesta diff # $score=$a-$b # aliases=qw(difference diff d minus -) sum # $score=$a+$b # aliases=qw(sum add plus +) min # $score=min($a,$b) # aliases=qw(minimum min <) max # $score=max($a,$b) # aliases=qw(maximum max >) avg # $score=avg($a,$b) # aliases=qw(average avg mean) havg # $score~=harmonic_avg($a,$b) # aliases=qw(harmonic-average harmonic-mean havg hmean ha h) gavg # $score~=geometric_avg($a,$b) # aliases=qw(geometric-average geometric-mean gavg gmean ga g) lavg # $score~=log_avg($a,$b) # aliases=qw(logarithmic-average logarithmic-mean log-average log-mean lavg lmean la l)To avoid singularities resulting from sparse data, the
havgandgavgoperations actually compute the arithmetic average of the harmonic (rsp. geometric) mean of and the raw arithmetic mean; e.g.score_havg($a,$b) = (($a<0 || $b<0 ? 0 : (2*$a*$b)/($a+$b) ##-- harmonic mean + ($a+$b)/2 ##-- arithmetic mean )/2 ##-- average of harmonic- and arithmetic-meansThe default
diffoperation isadiff, which selects those items with the greatest absolute differences among the (pre-trimmed) k-best items in its operand profiles. Thesumandavgoperations return equivalent rankings, but may assign undesirably high score values for non-uniform operand values (e.g.avg(0,8)=avg(4,4)=4, but only the latter configuration indicates similar collocation behavior in the operand profiles). Thehavg,gavg, andlavgoperations attempt to address this shortcoming by penalizing non-uniform score-pairs, and tend to return similar rankings in the range [$a:$b]. - clone
-
$dprf2 = $dprf->clone(); $dprf2 = $dprf->clone($keep_compiled);clones %$dprf; if $keep_score is true, compiled data is cloned too.
Basic Access
- operands
-
($prf1,$prf2) = $dprf->operands();get operand profiles.
- empty
-
$bool = $dprf->empty();returns true iff both operands are empty
I/O: JSON
- loadJsonData
-
$obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);guts for loadJsonString(), loadJsonFile()
I/O: Text
See also DiaColloDB::Persistent.
- saveTextHeader
-
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);print column title header for text output.
- saveTextFh
-
$bool = $prf->saveTextFh($fh, %opts);save flat TAB-separated text, format:
Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb SCOREdiff LABEL ITEM2...%opts:
label => $label, ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required format => $fmt, ##-- printf score formatting (default="%.4f") header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::MultiDiff)
I/O: HTML
- saveHtmlFile
-
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);Save flat HTML table data with rows of the form
SCOREa SCOREb DIFF PREFIX? ITEM2...%opts:
table => $bool, ##-- include <table>..</table> ? (default=1) body => $bool, ##-- include <html><body>..</html></body> ? (default=1) header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required label => $label, ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required format => $fmt, ##-- printf score formatting (default="%.4f")
Compilation
- populate
-
$dprf = $dprf->populate(); $dprf = $dprf->populate($prf1,$prf2);populates diff-profile by subtracting $prf2 scores from $prf1.
- compile
-
$dprf = $dprf->compile($func,%opts);compile for score-function $func, one of qw(f fm mi ld); default='f'.
- uncompile
-
$dprf = $dprf->uncompile();un-compiles all scores for $dprf
- diffop
-
$opname = $dprf->diffop(); $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);Returns canonical diff operation-name for $opNameOrAlias.
- diffsub
-
\&FUNC = $dprf->diffsub(); \&FUNC = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);Returns low-level binary diff operation for diff-operation $opNameOrAlias (default=$dprf->{diff}).
- diffpretrim
-
$bool = $dprf->diffpretrim() $bool = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias)Returns true iff diff should pre-trim operand profiles (currently just for
adiffandmin). - diffkbest
-
$selector = $dprf->diffkbest(); $selector = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);Returns 'kbest' selector appropriate for which() or trim() methods.
- diffop_diff
- diffop_sum
- diffop_min
- diffop_max
- diffop_avg
- diffop_havg
- diffop_gavg
- diffop_lavg
-
$diff = diffop_diff($ascore,$bscore)Low-level diff-operation subs.
Trimming
- trim
-
$dprf = $dprf->trim(%opts);trims profile and operands; %opts:
kbest => $kbest, ##-- retain only $kbest items (by score value) kbesta => $kbesta, ##-- retain only $kbest items (by score absolute value) cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff keep => $keep, ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH) drop => $drop, ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)
Stringification
- stringify
-
$dprf = $dprf->stringify( $obj); $dprf = $dprf->stringify(\@key2str) $dprf = $dprf->stringify(\&key2str) $dprf = $dprf->stringify(\%key2str)stringifies profile and operands (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.
Binary operations
- _add
-
$dprf = $dprf->_add($dprf2,%opts);adds $dprf2 operatnd frequency data to $dprf operands (destructive); implicitly un-compiles $dprf. %opts:
N => $bool, ##-- whether to add N values (default:true) f1 => $bool, ##-- whether to add f1 values (default:true)
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dcdb-create.per(1), dcdb-query.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), dcdb-dump.perl(1), DiaColloDB(3pm), perl(1), ...