NAME

PDL::Ngrams - N-Gram utilities for PDL

SYNOPSIS

use PDL;
use PDL::Ngrams;

##---------------------------------------------------------------------
## Basic Data
$toks = rint(10*random(10));

##---------------------------------------------------------------------
## ... stuff happens

DESCRIPTION

PDL::Ngrams provides basic utilities for tracking N-grams over PDL vectors.

FUNCTIONS

Counting N-Grams over PDLs

ng_cofreq

Signature: (toks(@adims,N,NToks); %args)

Returns: (int [o]ngramfreqs(NNgrams); [o]ngramids(@adims,N,NNgrams))

Keyword arguments (optional):

norotate => $bool,                      ##-- if true, $toks() will NOT be rotated along $N
boffsets => $boffsets(NBlocks)          ##-- block-offsets in $toks() along $NToks
delims   => $delims(@adims,N,NDelims)   ##-- delimiters to splice in at block boundaries

Count co-occurrences (esp. N-Grams) over a token vector $toks. This function really just wraps ng_delimit(), ng_rotate(), vv_qsortvec(), and rlevec().

ng_rotate

Signature: (toks(@adims,N,NToks); [o]rtoks(@adims,N,NToks-N+1))

Create a co-occurrence matrix by rotating a (delimited) token vector $toks(). Returns a matrix $rtoks() suitable for passing to ng_cofreq().

Delimiter Insertion and Removal

The following functions can be used to add or remove delimiters to a PDL vector. This can be useful to add or remove beginning- and/or end-of-word markers to rsp. from a PDL vector, before rsp. after constructing a vector of N-gram vectors.

ng_delimit

Signature: (toks(NToks); indx boffsets(NBlocks); delims(NDelims); [o]dtoks(NDToks))

Add block-delimiters (e.g. BOS,EOS) to a vector of raw tokens.

See "ng_delimit" in PDL::Ngrams::Utils.

ng_undelimit

Signature: (dtoks(NDToks); indx boffsets(NBlocks); int NDelims(); [o]toks(NToks))

Remove block-delimiters (e.g. BOS,EOS) from a vector of delimited tokens.

See "ng_undelimit" in PDL::Ngrams::Utils.

Low-Level Functions

Some additional low-level functions are provided in the PDL::Ngrams::Utils package. See PDL::Ngrams::Utils for details.

ACKNOWLEDGEMENTS

perl by Larry Wall.

AUTHOR

Bryan Jurish <moocow@cpan.org>

PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.

COPYRIGHT

Copyright (c) 2007-2022, Bryan Jurish. All rights reserved.

This package is free software. You may redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1), PDL(3perl), PDL::Ngrams::Utils(3perl)