NAME
Lingua::Word::Parser - Parse a word into known and unknown parts
VERSION
version 0.0211
SYNOPSIS
use Lingua::Word::Parser;
my $p = Lingua::Word::Parser->new(
word => 'abioticaly',
file => 'eg/lexicon.dat',
);
# Or with a localhost database source:
my $p = Lingua::Word::Parser->new(
word => 'abioticaly',
dbname => 'fragments',
dbuser => 'akbar',
dbpass => '0p3n53454m3',
);
my ($known) = $p->knowns; #warn Dumper $known;
my $combos = $p->power; #warn Dumper $combos;
my $scored = $p->score; #warn Dumper $score;
# The best guess is the last sorted score-set:
warn Dumper $scored->{ [ sort keys %$score ]->[-1] };
DESCRIPTION
A Lingua::Word::Parser
breaks a word into known affixes.
METHODS
new()
$x = Lingua::Word::Parser->new(%arguments);
Create a new Lingua::Word::Parser
object.
Arguments and defaults:
word: undef
lex: undef
fetch_lex()
Populate word-part => regular-expression lexicon.
This file has lines of the form:
a(?=\w) opposite
ab(?=\w) away
(?<=\w)o(?=\w) combining
(?<=\w)tic possessing
db_fetch()
Populate the lexicon from a database source called `fragments`
.
This database table has records of the form:
affix definition
-----------------------------
a(?=\w) opposite
ab(?=\w) away
(?<=\w)o(?=\w) combining
(?<=\w)tic possessing
knowns()
Fingerprint the known word parts.
power()
Find the "non-overlapping powerset."
score()
Score the known vs unknown word part combinations into ratios of characters and chunks or parts or "spans of adjacent characters."
grouping()
Make groups of "un-digitized" strings where k = known and u = unknown.
rle()
Compress k/u strings into contiguous chunks.
does_not_overlap()
Compute whether the given masks overlap.
or_together()
Combine a list of bitmasks.
reconstruct()
Reconstruct the word, with delimiters around known combinations.
SEE ALSO
Lingua::TokenParse - The predecessor of this module.
http://en.wikipedia.org/wiki/Affix is the tip of the iceberg...
AUTHOR
Gene Boggs <gene@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Gene Boggs.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.