NAME

Lingua::Word::Parser

VERSION

version 0.01

SYNOPSIS

use Lingua::Word::Parser;
my $p = Lingua::Word::Parser->new(
   word => shift || 'abioticaly',
   file => 'eg/lexicon.dat',
);
my ($known) = $p->knowns; #warn Dumper $known;
my $combos  = $p->power;  #warn Dumper $combos;
my $scored  = $p->score;  #warn Dumper $score;
warn Dumper $scored->{ [ sort keys $score ]->[-1] };

DESCRIPTION

A Lingua::Word::Parser breaks a word into known affixes.

NAME

Lingua::Word::Parser - Parse a word into known and unknown parts

METHODS

new()

$x = Lingua::Word::Parser->new(%arguments);

Create a new Lingua::Word::Parser object.

Arguments and defaults:

word: undef
lex:  undef

fetch_lex()

Populate word-part => regular-expression lexicon.

knowns()

Fingerprint the known word parts.

power()

Find the "non-overlapping powerset."

score()

Score the known vs unknown word part combinations into ratios of characters and chunks or parts or "spans of adjacent characters."

grouping()

Make groups of "un-digitized" strings where known and unknown.

rle()

Compress k/u strings into contiguous chunks.

does_not_overlap()

Compute whether the given masks overlap.

or_together()

Combine a list of bitmasks.

reconstruct()

Reconstruct the word, with delimiters around known combinations.

SEE ALSO

Lingua::TokenParse - The predecessor of this module.

http://en.wikipedia.org/wiki/Affix is the tip of the iceberg...

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.