NAME
Lingua::TokenParse - Parse a word into scored, familiar combinations
SYNOPSIS
use Lingua::TokenParse;
my %lexicon;
@lexicon{qw(part i tion on)} = ();
my $obj = Lingua::TokenParse->new(
word => 'partition',
lexicon => \%lexicon,
);
$obj->output_knowns;
ABSTRACT
This class represents a Lingua::TokenParse object and contains methods to parse a given word into familiar combinations based on a lexicon of known word parts.
DESCRIPTION
A word like "partition" is actually composed of a few different word parts. Given a lexicon of known word parts, it is possible to partition this word into combinations of these (possibly overlapping) parts. Each of these combinations can be given a score, which represents a measure of familiarity.
Currently, this familiarity mesasure is a simple ratio of known to unknown parts.
METHODS
new()
$obj = Lingua::TokenParse->new(
word => $word,
lexicon => \%lexicon,
);
Return a new Lingua::TokenParse object.
This method will automatically call the partition methods (detailed below) if a word and lexicon are provided.
build_parts()
$obj->build_parts();
Construct an array of the word partitions.
successors()
$obj->successors();
Recursively compute the array of all possible word part combinations.
trim_combinations()
$obj->trim_combinations();
Compute the familiar word part combinations.
output_knowns()
$obj->output_knowns();
Convenience method to output the familiar word part combinations.
ACCESSORS
These accessors both get and set their respective values. Note that, if you set any of these after construction, you must manually run the partition methods. Also, note that it is pretty useless to set the parts, combinations and knowns lists, as they are computed by the partition methods.
word()
$word = $obj->word($word);
The actual word to partition.
lexicon()
$lexicon = $obj->lexicon($lexicon);
The hash reference of word parts (keys) with their (optional) definitions (values).
parts()
$parts = $obj->parts();
The array reference of word partitions.
Note that this method is only useful for fetching, since the parts are computed by the build_parts() method.
combinations()
$combinations = $obj->combinations();
The array reference of all possible word part combinations.
Note that this method is only useful for fetching, since the combinations are computed by the successors() method.
knowns()
$knowns = $obj->knowns();
The hash reference of known combinations (keys) with their familiarity scores (values). Note that only the non-zero scored combinations are kept.
Note that this method is only useful for fetching, since the knowns are computed by the trim_combinations() method.
DEPENDENCIES
None
TO DO
Handle the successor method and globals correctly.
Return word part definitions.
Synthesize a term list based on word part (thesaurus) definitions. (That is, go in reverse! Non-trivial!)
DEDICATION
My Grandmother and English teacher - Frances Jones
AUTHOR
Gene Boggs <cpan@ology.net>
COPYRIGHT AND LICENSE
Copyright 2003 by Gene Boggs
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.