NAME

Lingua::EN::Infinitive - Determine the infinitive form of a conjugated word

Synopsis

use Lingua::EN::Infinitive;

my($spell) = Lingua::EN::Infinitive -> new();
my($word)  = 'rove';

# Method 1:

my($word1, $word2, $suffix, $rule) = $spell -> stem($word);

# Method 2:

$spell -> stem($word);

my($word1, $word2, $suffix, $rule) =
(
	$spell -> word1,  # A possibility.
	$spell -> word2,  # A possibility, or ''.
	$spell -> suffix,
	$spell -> rule,
);

print "Conjugated: $word. Infinitive: $word1. \n";

# Now, adjective to noun conversion.

my($noun);

for (qw/Turkish amateurish cuttlefish vixenish whitish/)
{
	$noun = $spell -> adjective2noun($_);

	print "$_ => ", (defined $noun ? $noun : $_), ". \n";
}

# Now, adjective to noun conversion.

my(%expected) =
(
	Turkish		=> 'Turkey',
	amateurish	=> 'amateur',
	cuttlefish	=> '', # I.e. No change because 'cuttlef' can't be an adjective.
	demolish	=> '', # Ditto.
	radish		=> '', # Ditto.
	swish		=> '', # Ditto.
	vixenish	=> 'vixen',
	whitish		=> 'white',
);

my($noun);

for (qw/Turkish amateurish cuttlefish demolish radish swish vixenish whitish/)
{
	$noun = $spell -> adjective2noun($_);

	print "$_ => ", ($noun ? $noun : $_), '. OK: ', ( ($noun eq $expected{$_}) ? 'Yes' : 'No'), ". \n";
}

See scripts/demo.pl and t/test.t for sample code.

Description

Generic Code

This section discusses the results from calling the method "stem($word)".

Determines the infinitive form of a conjugated word. Also, determines the suffix used to identify which rule to apply to transform the conjugated word into the infinitive form.

Either 1 or 2 possible infinitives are returned. You must check that the first is really an English word. If it is, then it is the result. If it is not valid, then check the second.

This module does not provide a way of determining whether or not a candidate solution is in fact an English word.

Failure to deconjugate is indicated by $word1 eq ''.

In general, you can ignore the 3rd and 4th values returned from "stem($word)".

The algorithm is based on the McIlroy article (see below), after first checking for irregular words.

In the hash 'suffix2rule', you will see the key 'order'. This specifies the sort order in which to check the McIlroy rules. I have changed his ordering in a number of places, based on my interpreation of which order produces the better result.

Adjectival Code

This section discusses the results from calling the method adjective2noun().

The source contains a list of (adjective => noun) pairs taken from /usr/share/dict/words, and so it can be used to convert adjectives to nouns.

I suggest calling adjective2noun() if "stem($word)" does not provide a suitable candidate.

Installation

You install Lingua::EN::Infinitive, as you would install any perl module library, by running these commands:

perl Makefile.PL
make
make test
make install

Warning

Do not make the false assumption that

"$word1$suffix" eq $word
	or
"$word2$suffix" eq $word

Methods

adjective2noun($adjective)

Returns either the noun which was used to generate the adjective in the first place, or the empty string.

There are 99 (adjective => noun) pairs in the source code.

rule()

Must only be called after calling "stem($word)".

Returns the same string as the 4th value returned by "stem($word)".

stem($word)

Returns a 4-element array:

o A candidate word, or the empty string

If a non-empty string is returned, check that it is a real word. If it is, then that is the candidate you want.

If it is not a real word, or it is empty, check the next word returned.

o Another candidate, or the empty string

As before, check that it is a real word.

o A suffix, or the empty string

If one of the first two words is real, this is the suffix removed from the input word.

Normally, you would ignore this value.

o A rule number, or the empty string.

The arbitrary rule number used to determine the candidate. Rules are applied in the order given by these rule numbers.

Normally, you would ignore this value.

suffix()

Must only be called after calling "stem($word)".

Returns the same string as the 3rd value returned by "stem($word)".

word1()

Must only be called after calling "stem($word)".

Returns the same string as the 1st value returned by "stem($word)".

word2()

Must only be called after calling "stem($word)".

Returns the same string as the 2nd value returned by "stem($word)".

Reference

Title:   Development of a Spelling List
Author:  M. Douglas McIlroy
Journal: IEEE Transactions on Communications
Issue:   Vol COM-30, No 1, January 1982

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Repository

https://github.com/ronsavage/Lingua-EN-Infinitive

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=Lingua::EN::Infinitive.

Author

Lingua::EN::Infinitive was written by Ron Savage <ron@savage.net.au> in 1998.

License

Australian copyright (c) 1999-2002 Ron Savage.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Perl License, a copy of which is available at:
http://www.opensource.org/licenses/index.html