NAME
PDL::Ngrams - N-Gram utilities for PDL
SYNOPSIS
use PDL;
use PDL::Ngrams;
##---------------------------------------------------------------------
## Basic Data
$toks = rint(10*random(10));
##---------------------------------------------------------------------
## ... stuff happens
DESCRIPTION
PDL::Ngrams provides basic utilities for tracking N-grams over PDL vectors.
FUNCTIONS
Counting N-Grams over PDLs
ng_cofreq
Signature: (toks(@adims,N,NToks); %args)
Returns: (int [o]ngramfreqs(NNgrams); [o]ngramids(@adims,N,NNgrams))
Keyword arguments (optional):
norotate => $bool, ##-- if true, $toks() will NOT be rotated along $N
boffsets => $boffsets(NBlocks) ##-- block-offsets in $toks() along $NToks
delims => $delims(@adims,N,NDelims) ##-- delimiters to splice in at block boundaries
Count co-occurrences (esp. N-Grams) over a token vector $toks. This function really just wraps ng_delimit(), ng_rotate(), vv_qsortvec(), and rlevec().
ng_rotate
Signature: (toks(@adims,N,NToks); [o]rtoks(@adims,N,NToks-N+1))
Create a co-occurrence matrix by rotating a (delimited) token vector $toks(). Returns a matrix $rtoks() suitable for passing to ng_cofreq().
Delimiter Insertion and Removal
The following functions can be used to add or remove delimiters to a PDL vector. This can be useful to add or remove beginning- and/or end-of-word markers to rsp. from a PDL vector, before rsp. after constructing a vector of N-gram vectors.
ng_delimit
Signature: (toks(NToks); indx boffsets(NBlocks); delims(NDelims); [o]dtoks(NDToks))
Add block-delimiters (e.g. BOS,EOS) to a vector of raw tokens.
See "ng_delimit" in PDL::Ngrams::Utils.
ng_undelimit
Signature: (dtoks(NDToks); indx boffsets(NBlocks); int NDelims(); [o]toks(NToks))
Remove block-delimiters (e.g. BOS,EOS) from a vector of delimited tokens.
See "ng_undelimit" in PDL::Ngrams::Utils.
Low-Level Functions
Some additional low-level functions are provided in the PDL::Ngrams::Utils package. See PDL::Ngrams::Utils for details.
ACKNOWLEDGEMENTS
perl by Larry Wall.
AUTHOR
Bryan Jurish <moocow@cpan.org>
PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.
COPYRIGHT
Copyright (c) 2007-2022, Bryan Jurish. All rights reserved.
This package is free software. You may redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
perl(1), PDL(3perl), PDL::Ngrams::Utils(3perl)