NAME
DiaColloDB::Profile - diachronic collocation db, (co-)frequency profile
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Profile;
##========================================================================
## Constructors etc.
$prf = CLASS_OR_OBJECT->new(%args);
$prf2 = $prf->clone();
$prf2 = $prf->shadow();
##========================================================================
## Basic Access
$label = $prf->label();
\@titles_or_undef = $prf->titles();
@keys = $prf->scoreKeys();
$bool = $prf->empty();
##========================================================================
## I/O: JSON
 *TO_JSON = \&TO_JSON__table;
##========================================================================
## I/O: Text
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
$bool = $prf->saveTextFh($fh, %opts);
##========================================================================
## I/O: HTML
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
##========================================================================
## Compilation
$prf = $prf->compile($func,%opts);
$prf = $prf->uncompile();
$prf = $prf->compile_f();
$prf = $prf->compile_lf();
$prf = $prf->compile_lfm();
$prf = $prf->compile_fm();
$prf = $prf->compile_mi(%opts);
$prf = $prf->compile_mi3(%opts);
$prf = $prf->compile_ld(%opts);
$prf = $prf->compile_ll(%opts);
##========================================================================
## Trimming
\@keys = $prf->which(%opts);
$prf   = $prf->trim(%opts);
##========================================================================
## Stringification
$i2s = $prf->stringify_map( $obj);
$prf = $prf->stringify( $obj);
##========================================================================
## Algebraic operations
$prf = $prf->_add($prf2,%opts);
$prf3 = $prf1->add($prf2,%opts);
$psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts);
$psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts);
$diff = $prf1->diff($prf2,%opts);DESCRIPTION
DiaColloDB::Profile is a class for representing low-level collocate frequency profile data for a single date-slice as retrieved e.g. from a native index or DDC back-end. It includes methods for compiling profile scores via several score functions (e.g. frequency, pointwise mi * log-frequency, log Dice), k-best trimming, stringification, basic algebraic manipulation, and serialization (text, HTML, or JSON).
Globals & Constants
- Variable: @ISA
- 
DiaColloDB::Profile inherits from DiaColloDB::Persistent. 
Constructors etc.
- new
- 
$prf = CLASS_OR_OBJECT->new(%args);%args, object structure: label => $label, ##-- string label (used by Multi; undef for none(default)) N => $N, ##-- total marginal relation frequency f1 => $f1, ##-- total marginal frequency of target word(s) f2 => \%f2, ##-- total marginal frequency of collocates: ($i2=>$f2, ...) f12 => \%f12, ##-- collocation frequencies, %f12 = ($i2=>$f12, ...) titles => \@titles, ##-- item group titles (default:undef: unknown) ## eps => $eps, ##-- smoothing constant (default=0) score => $func, ##-- selected scoring function qw(f fm lf lfm mi mi3 ld ll) milf => \%milf_12, ##-- score: mutual information * logFreq a la Wortprofil; requires compile_milf() mi1 => \%mi1_12, ##-- score: mutual information; requires compile_mi1() mi3 => \%mi3_12, ##-- score: mutual information^3 a la Rychlý (2008); requires compile_mi3() ld => \%ld_12, ##-- score: log-dice a la Wortprofil; requires compile_ld() ll => \%ll_12, ##-- score: 1-sided log-likelihood a la Evert (2008); requires compile_ll() fm => \%fm_12, ##-- frequency per million score; requires compile_fm() lf => \%lf_12, ##-- log-frequency ; requires compile_lf() lfm => \%lfm_1, ##-- log-frequency per million; requires compile_lfm()
- clone
- 
$prf2 = $prf->clone(); $prf2 = $prf->clone($keep_compiled)clones the profile $prf. if $keep_score is true, compiled data is cloned too. 
- shadow
- 
$prf2 = $prf->shadow(); $prf2 = $prf->shadow($keep_compiled)shadows %$prf. if $keep_score is true, compiled data is shadowed too (all zeroes). 
Basic Access
- label
- 
$label = $prf->label();get profile label 
- titles
- 
\@titles_or_undef = $prf->titles();get item titles 
- scoreKeys
- 
@keys = $prf->scoreKeys();returns known score function keys 
- empty
- 
$bool = $prf->empty();returns true iff profile is empty 
I/O: JSON
- TO_JSON__table
- 
$thingy = $obj->TO_JSON__table()test alternative JSON format (small but slow). 
- TO_JSON__flat
- 
$thingy = $obj->TO_JSON__flat()test alternative JSON format (small but slow). 
I/O: Text
See also DiaColloDB::Persistent.
- saveTextHeader
- 
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);prints column titles for text output. 
- saveTextFh
- 
$bool = $prf->saveTextFh($fh, %opts);save flat TAB-separated text, format: N F1 F2 F12 SCORE LABEL ITEM2...%opts: label => $label, ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required format => $fmt, ##-- printf format for scores (default="%f") header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi)
I/O: HTML
- saveHtmlFile
- 
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);Save flat HTML table data with rows of the form N F1 F2 F12 SCORE PREFIX? ITEM2...%opts: table => $bool, ##-- include <table>..</table> ? (default=1) body => $bool, ##-- include <html><body>..</html></body> ? (default=1) header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required label => $label, ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required format => $fmt, ##-- printf score formatting (default="%.4f")
Compilation
- compile
- 
$prf = $prf->compile($func,%opts);compile for score-function $func, one of qw(f fm lf lfm mi1 mi3 milf ld ll); default='f' (emits a warning). 
- uncompile
- 
$prf = $prf->uncompile();un-compiles all scores for $prf 
- compile_f
- 
$prf = $prf->compile_f();just sets $prf->{score} = 'f12' 
- compile_lf
- 
$prf = $prf->compile_lf();computes log-frequency profile in $prf->{lf}; sets $prf->{score}='lf'. 
- compile_fm
- 
$prf = $prf->compile_fm();computes frequency-per-million in $prf->{fm}; sets $prf->{score}='fm'. 
- compile_lfm
- 
$prf = $prf->compile_lfm(%opts);computes log-frequency-per-million in $prf->{lfm} sets $prf->{score}='lfm'. 
- compile_milf
- 
$prf = $prf->compile_milf(%opts);formerly compile_mi() computes MI*logF-profile in $prf->{milf} a la Rychlý (2008); sets $prf->{score}='milf'. %opts: eps => $eps #-- clobber $prf->{eps}
- compile_mi1
- 
$prf = $prf->compile_mi1(%opts);computes raw pointwise-MI profile in $prf->{mi1}; sets $prf->{score}='mi1'. 
- compile_mi3
- 
$prf = $prf->compile_mi3(%opts);computes MI^3 profile in $prf->{mi3} a la Rychlý (2008); sets $prf->{score}='mi3'. 
- compile_ld
- 
$prf = $prf->compile_ld(%opts);computes log-dice profile in $prf->{ld} a la Rychlý (2008); sets $pf->{score}='ld'. %opts: eps => $eps #-- clobber $prf->{eps}
- compile_ll
- 
$prf = $prf->compile_ll(%opts);computes 1-sided log-log-likelihood ratio in $prf->{ll} a la Evert (2008); sets $pf->{score}='ll'. %opts: eps => $eps #-- clobber $prf->{eps}
Trimming
- which
- 
\@keys = $prf->which(%opts);returns 'good' keys for trimming options %opts: cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff kbest => $kbest, ##-- retain only $kbest items kbesta => $kbesta, ##-- retain only $kbest items (absolute value) return => $which, ##-- either 'good' (default) or 'bad' as => $as, ##-- 'hash' or 'array'; default='array'
- trim
- 
$prf = $prf->trim(%opts);trim profile to contain only 'good' keys. %opts: kbest => $kbest, ##-- retain only $kbest items (by score value) kbesta => $kbesta, ##-- retain only $kbest items (by score absolute value) cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff keep => $keep, ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH) drop => $drop, ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)NOTE: this COULD be factored out into s.t. like $prf->trim($prf->which(%opts)), but it's about 15% faster inline. 
Stringification
- stringify_map
- 
$i2s = $prf->stringify_map( $obj); $i2s = $prf->stringify_map(\@key2str); $i2s = $prf->stringify_map(\&key2str); $i2s = $prf->stringify_map(\%key2str);guts for stringify: get a map for stringification 
- stringify
- 
$prf = $prf->stringify( $obj); $prf = $prf->stringify(\@key2str) $prf = $prf->stringify(\&key2str) $prf = $prf->stringify(\%key2str)stringifies profile (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}. 
Algebraic operations
- _add
- 
$prf = $prf->_add($prf2,%opts);adds $prf2 frequency data to $prf (destructive); implicitly un-compiles $prf. %opts: N => $bool, ##-- whether to add N values (default:true) f1 => $bool, ##-- whether to add f1 values (default:true)
- add
- 
$prf3 = $prf1->add($prf2,%opts);returns sum of $prf1 and $prf2 frequency data (destructive). %opts: as for _add(). 
- _sum
- 
$psum = $CLASS_OR_OBJECT->_sum(\@profiles,%opts);- returns a profile representing sum of \@profiles, passing %opts to _add(). 
- if called as a class method and \@profiles contains only 1 element, that element is returned 
- otherwise, \@profiles are added to the (new) object 
 
- sum
- 
$psum = $CLASS_OR_OBJECT->sum(\@profiles,%opts);returns a new profile representing sum of \@profiles; see _sum(). 
- diff
- 
$diff = $prf1->diff($prf2,%opts);wraps DiaColloDB::Profile::Diff->new($prf1,$prf2,%opts). 
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
REFERENCES
Didakowski, J. and Geyken, A. (2013). "From DWDS corpora to a German Word Profile – methodological problems and solutions. In: Network Strategies, Access Structures and Automatic Extraction of Lexicographical Information". 2nd Work Report of the Academic Network "Internet Lexicography". Mannheim: Institut für Deutsche Sprache. (OPAL - Online publizierte Arbeiten zur Linguistik X/2012), S. 43-52. URL http://www.dwds.de/static/website/publications/pdf/didakowski_geyken_internetlexikografie_2012_final.pdf
Evert, S. (2008). "Corpora and collocations." In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, article 58, pages 1212-1248. Mouton de Gruyter, Berlin. URL (extended manuscript): http://purl.org/stefan.evert/PUB/Evert2007HSK_extended_manuscript.pdf
Kilgarriff, A. and Tugwell, D. (2002). "Sketching words". In M.-H. Corréard (ed.) Lexicography and Natural Language Processing: A Festschrift in Honour of B. T. S. Atkins. EURALEX, 125-137. URL http://www.kilgarriff.co.uk/Publications/2002-KilgTugwell-AtkinsFest.pdf
Rychlý, P. (2008). "A lexicographer-friendly association score". In P. Sojka and A. Horák (eds.) Proceedings of Recent Advances in Slavonic Natural Language Processing. RASLAN 2008, 69. URL http://www.muni.cz/research/publications/937193, http://www.fi.muni.cz/usr/sojka/download/raslan2008/13.pdf
SEE ALSO
DiaColloDB::Persistent(3pm), DiaColloDB::Profile::Diff(3pm), DiaColloDB::Profile::Multi(3pm), DiaColloDB::Profile::MultiDiff(3pm), DiaColloDB::Relation(3pm), DiaColloDB(3pm), perl(1), ...