NAME
PDL::EditDistance - Wagner-Fischer edit distance and alignment for PDLs.
SYNOPSIS
use PDL;
use PDL::EditDistance;
##-- input PDLs
$a = pdl([map { ord($_) } qw(G U M B O)]);
$b = pdl([map { ord($_) } qw(G A M B O L)]);
$a1 = pdl([0, map { ord($_) } qw(G U M B O)]);
$b1 = pdl([0, map { ord($_) } qw(G A M B O L)]);
##-------------------------------------------------------------
## Levenshtein distance
$dist = edit_distance_static($a,$b, 0,1,1,1);
($dist,$align) = edit_align_static($a,$b, 0,1,1,1);
##-------------------------------------------------------------
## Wagner-Fischer distance
@costs = ($costMatch=0,$costInsert=1,$costDelete=1,$costSubstitute=2);
$dist = edit_distance_static($a,$b, @costs);
($dist,$align) = edit_align_static($a,$b, @costs);
##-------------------------------------------------------------
## General edit distance
$costsMatch = random($a->nelem+1, $b->nelem+1);
$costsIns = random($a->nelem+1, $b->nelem+1);
$costsDel = random($a->nelem+1, $b->nelem+1);
$costsSubst = random($a->nelem+1, $b->nelem+1);
@costs = ($costsMatch,$costsIns,$costDel,$costsSubst);
$dist = edit_distance_full($a,$b,@costs);
($dist,$align) = edit_align_full($a,$b,@costs);
##-------------------------------------------------------------
## Alignment
$op_match = align_op_match(); ##-- constant
$op_del = align_op_insert1(); ##-- constant
$op_ins = align_op_insert2(); ##-- constant
$op_subst = align_op_substitute(); ##-- constant
($apath,$bpath,$pathlen) = edit_bestpath($align);
($ai,$bi,$ops,$pathlen) = edit_pathtrace($align);
##-------------------------------------------------------------
## Longest Common Subsequence
$lcs = edit_lcs($a,$b);
($ai,$bi,$lcslen) = lcs_backtrace($a,$b,$lcs);
_edit_pdl
Signature: (a(N); [o]apdl(N+1))
Convenience method. Returns a pdl $apdl() suitable for representing $a(), which can be specified as a UTF-8 or byte-string, as an arrays of numbers, or as a PDL. $apdl(0) is always set to zero.
edit_costs
Signature: (PDL::Type type; int N; int M;
[o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))
Convenience method. Ensures existence and proper dimensionality of cost matrices for inputs of length N and M.
_edit_costs
Signature: (PDL::Type type; int N1; int M1;
[o]costsMatch(N1,M1); [o]costsIns(N1,M1); [o]costsDel(N1,M1); [o]costsSubst(N1,M1))
Low-level method. Ensures existence and proper dimensionality of cost matrices for inputs of length N1-1 and M1-1.
edit_costs_static
Signature: (PDL::Type type; int N; int M;
staticCostMatch(); staticCostIns(); staticCostSubst();
[o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))
Convenience method.
edit_distance_full
Signature: (a(N); b(M);
costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,M+1); costsSubst(N+1,M+1);
[o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance matrix for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.
edit_align_full
Signature: (a(N); b(M);
costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,N+1); costsSubst(N+1,M+1);
[o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.
edit_distance_static
Signature: (a(N); b(M);
staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst();
[o]dist(N+1,M+1))
Convenience method. Compute the edit distance matrix for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_distance_full($matches,@costs,$dist), but slightly faster.
edit_align_static
Signature: (a(N); b(M);
staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst();
[o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_align_full($matches,@costs,$dist), but slightly faster.
align_op_delete
Alias for align_op_insert1()
align_op_insert
Alias for align_op_insert2()
align_ops
Signature: ([o]ops(4))
Alignment matrix value constants 4-element pdl (match,insert1,insert2,substitute).a
edit_bestpath
Signature: (align(N+1,M+1); [o]apath(N+M+2); [o]bpath(N+M+2); [o]pathlen())
Convenience method. Compute best path through alignment matrix $align(). Stores paths for original input strings $a() and $b() in $apath() and $bpath() respectively. Negative values in $apath() and $bpath() indicate insertion/deletion operations. On completion, $pathlen() holds the actual length of the paths.
edit_pathtrace
Signature: ( align(N+1,M+1); [o]ai(L); [o]bi(L); [o]ops(L); [o]$pathlen() )
Convenience method. Compute alignment path backtrace through alignment matrix $align() from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively. Unlike edit_bestpath(), null-moves for $ai() and $bi() are not stored here as negative values. Returned pdls ($ai,$bi,$ops) are trimmed to the appropriate path length.
edit_lcs
Signature: (a(N); b(M); int [o]lcs(N+1,M+1);)
Convenience method. Compute the longest common subsequence (LCS) matrix for input PDLs $a1() and $b1(). The output matrix $lcs() contains at cell ($i+1,$j+1) the length of the LCS between $a1(0..$i) and $b1(0..$j); thus $lcs($N,$M) contains the length of the LCS between $a() and $b().
lcs_backtrace
Signature: (a(N); b(M); int lcs(N+1,M+1); int ifinal(); int jfinal(); int [o]ai(L); int [o]bi(L); int [o]len())
Convenience method. Compute longest-common-subsequence backtrace through LCS matrix $lcs() for original input strings ($a(),$b()) from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively.
ACKNOWLEDGEMENTS
Perl by Larry Wall.
PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.
KNOWN BUGS
Probably many.
AUTHOR
Bryan Jurish <moocow@cpan.org>
Copyright Policy
Copyright (C) 2006-2015, Bryan Jurish. All rights reserved.
This package is free software, and entirely without warranty. You may redistribute it and/or modify it under the same terms as Perl itself, either Perl 5.20.2, or at your option any later version of Perl 5.
SEE ALSO
perl(1), PDL(3perl).