NAME

Lingua::Phonology::Rules - a module for defining and applying phonological rules.

SYNOPSIS

use Lingua::Phonology;
$phono = new Lingua::Phonology;

$rules = $phono->rules;

# Adding and manipulating rules is discussed in the "WRITING RULES"
# section

DESCRIPTION

This module allows for the creation of linguistic rules, and the application of those rules to "words" of Segment objects. You, the user, add rules to a Rules object, defining various parameters and code references that actually perform the action of the rule. Lingua::Phonology::Rules will take care of the guts of applying and creating rules.

The rules you create may have the following parameters. This is just a brief description of the parameters--a more detailed discussion of their effect is in the "WRITING RULES" section.

  • domain - defines the domain within which the rule applies. This should be the name of a feature in the featureset of the segments which the rule is applied to.

  • tier - defines the tier on which the rule applies. Must be the name of a feature in the feature set for the segments of the word you pass in.

  • direction - defines the direction that the rule applies in. Must be either 'leftward' or 'rightward.' If no direction is given, defaults to 'rightward'.

  • filter - defines a filter for the segments that the rule applies on. Must a code reference that returns a truth value.

  • where - defines the condition or conditions where the rule applies. Must be a coderef that returns a truth value. If no value is given, defaults to always true.

  • do - defines the action to take when the where condition is met. Must be a code reference. If no value is given, does nothing.

  • result - EXPERIMENTAL. Defines a condition that must be true after the do code has applied. Must be a code reference that returns a truth value. NOTE: This parameter depends on the module Whatif (available from CPAN), and will behave differently if this module is not present. See "Using result".

Lingua::Phonology::Rules is flexible and powerful enough to handle any sequential type of rule system. It cannot handle Optimality Theory-style processes, because those require a fundamentally different kind of algorithm.

METHODS

new

Returns a new Lingua::Phonology::Rules object. This method accepts no arguments.

add_rule

Adds one or more rules to the list. Takes a series of key-value pairs, where the keys are the names of rules to be added, and the values are hashrefs. Any of the parameters mentioned above may be used, so a single rule has the following maximal structure:

'Name of Rule' => {
	domain => 'some_feature',
	tier => 'some_feature',
	direction => 'rightward', # Can only be 'rightward' or 'leftward'
	where => \&foo,
	do => \&bar,
	result => \&baz
}

A detailed explanation of how to use these to make useful rules is in "WRITING RULES". A typical call to add_rule might look like what follows. Assume that 'nasal' and 'SYLL' are defined in the feature set you're using, and that nasalized() and denasalize() are subroutines defined elsewhere.

$rules->add_rule(
	Denasalization => {
		tier => 'nasal',
		domain => 'SYLLABLE',
		direction => 'rightward',
		where => \&nasalized,
		do => \&denasalize
	}
);

This method returns true if all rules were added successfully, otherwise false. If a rule already exists with the name you're attempting to add, it is first dropped.

drop_rule

$rules->drop_rule('Rule');

Takes one argument, the name of a rule, and removes that rule. Returns the hash reference of the properties of that rule, or undef if no such rule actually existed.

change_rule

$rules->change_rule(
    Denasalization => {
        tier => undef,
        where => undef,
        filter => \&nasalized
    }
);

This method is exactly like add_rule(), except that it may be used to change parameters on an existing rule. If the method call given above were used after the one shown for add_rule(), then the 'Denasalization' rule would be changed to have no tier or 'where' condition, but to have a filter defined by the subroutine nasalized. The other properties of the rule would be unchanged. If you attempt to use change_rule() with a rule that does not yet exist, you will get an error.

Returns true if all changes succeed, otherwise false.

loadfile

$rules->loadfile('phono.xml');

Loads rule definitions from a file. Returns true on success, false on failure. This feature is new as of v0.3, and comes with new capability for reading rules in a readable linguistic format. This is far too complex to describe here: please consult Lingua::Phonology::FileFormatPOD for details. / =head2 clear

$rules->clear;

Resets the Lingua::Phonology::Rules object by deleting all rules and all rule ordering.

tier

See below.

domain

See below.

direction

See below.

filter

See below.

where

See below.

do

See below.

result

All of the above methods behave identically. They may take one or two arguments. The first argument is the name of a rule. If only one argument is given, then these return the property of the rule that they name. If two arguments are given, then they set that property to the second argument. For example:

$rules->tier('Rule');				# Returns the tier
$rules->tier('Rule', 'feature');	# Sets the tier to 'feature'
$rules->domain('Rule');				# Returns the domain
$rules->domain('Rule', 'feature');	# Sets the domain to 'feature'
# Etc., etc.

apply

$rules->apply('Denasalization', \@word);

Applies a rule to a "word". The first argument to this function is the name of a rule, and the second argument is a reference to an array of Lingua::Phonology:: Segment objects. apply() will take the rule named and apply it to each segment in the array, after doing some appropriate magic with the tiers and the domains, if specified. For a full explanation on how apply() works and how to exploit it, see below in "WRITING RULES".

As of v0.2, the return value of apply() is an array with the modified contents of the array that was passed as a reference in the call to apply(). Thus, the return value of the rule above, if it were captured, would be the same as the contents of @word after apply() was called.

This method will set count, clobbering any earlier value. See "count" below.

Applying rules by name

You may also call rule names themselves as methods, in which case the only needed argument is an array reference to the word. Thus, the following is exactly identical to the preceding example:

$rules->Denazalization(\@word);

apply_all

$rules->apply_all(\@word);

When used with persist() and order(), this method can be used to apply all rules to a word with one call. The argument to this method should be a list of Segment objects, just as with apply().

Calling apply_all() applies the rules in the order specified by order(), applying the rules in persist() before and after every one. Rules that are part of the current object but which aren't specified in order() or persist() are not applied. See "order" and "persist" for details on those methods.

For example, say you had the following code:

$rules->persist('Persist 1', 'Persist 2');
$rules->order(['A-1', 'A-2', 'A-3'], 'B', ['C-1', 'C-2']);
$rules->apply_all(\@word);

When you call apply_all, the rules would be applied in this order:

Persist 1
Persist 2
A-1
A-2
A-3
Persist 1
Persist 2
B
Persist 1
Persist 2
C-1
C-2
Persist 1
Persist 2

In v0.2, the return value of apply_all() has changed (again). Now, apply_all() always returns a hash reference whose keys are the names of rules and whose values are the number of times that those rules were applied. This is the same thing that count() returns after a call to apply_all(). See "count" below.

order

$rules->order(['A-1', 'A-2', 'A-3'], 'B', ['C-1', 'C-2']);

If called with no arguments, returns an array of the current order in which rules apply when calling apply_all(). If called with one or more arguments, this sets the order in which rules apply.

The arguments to order() should be array references or strings. If you pass an array reference, the elements in the array should be strings that are the names of rules. A string is interpreted as an array reference of one element. When "apply_all" is called, all rules that are bundled together in one array will be applied, then the persistent rules will be applied, as described above.

Any strings that you pass will be converted to single-element array references when they are returned. Calling this:

$rules->order(1, 2, 3);

actually returns this:

([1], [2], [3]);

persist

$rules->persist('Persist 1', 'Persist 2');

If called with no arguments, returns an array of the current order in which persistent rules apply when calling apply_all(). Persistent rules are applied at the beginning and end of rule processing and between every rule in the middle. Calling this with one or more arguments assigns the list of persistent rules (and knocks out the existing list). You should not call persist() with array reference arguments, unlike order().

count

After a call to apply() or apply_all(), this method can be used to find out how many times the rule was applied. After apply(), the return value of this function will be an integer. After apply_all(), the return value of this method will be a hash reference, the keys of which are the rules that were applied, and the values of which are the times that those rules applied. Whatever value is there will be clobbered in the next call to apply() or apply_all(), so get it while you can.

WRITING RULES

Overview of the rule algorithm

The details of the algorithm, of course, are the module's business. But here's a general overview of what goes on in the execution of a rule:

  • The segments of the input word are broken up into domains, if a domain is specified. This is discussed in "using domains".

  • The segments of each domain are taken and the tier, if there is one, is applied to it. This generally reduces the number of segments being evaluated. Details of this process are discussed below in "using tiers".

  • The segments remaining after the tier is applied are passed through the filter. Segments for which the filter evaluates to true are passed on to the executer.

  • Executing the rule involves examining every segment in turn and deciding if the criteria for applying the rule, defined by the where property, are met. If so, the action defined by do is performed. If the direction of the rule is specified as "rightward", then the criterion-checking and rule execution begin with the leftmost segment and proceed to the right. If the direction is "leftward", the opposite occurs: focus begins on the rightmost segment and proceeds to the left.

  • If a result is specified, after each potential application of the do code, the result condition will be checked. If that condition is true, the rule application goes on to the next segment. If the result condition is false, then the rule is "undone", leaving the input word exactly the way that it was before.

The crucial point is that the rule mechanism has focus on one segment at a time, and that this focus proceeds across each available segment in turn. Criterion checking and execution are done for every segment. According to the order given above, where and do are almost the last things to be executed, but they're the most fundamental, so we'll examine them first.

Using 'where' and 'do'

Of course, the actual criteria and execution are done by the coderefs that you supply. So you have to know how to write reasonable criteria and actions.

Lingua::Phonology::Rules will pass an array of segments to both of the coderefs that you give it. This array of segments will be arranged so that the segment that currently has focus will be at index 0, the following segment will be at 1, and the preceding segment at -1, etc. The ends of the "word" (or domain, if you're using domains) are indicated by special segments that have the feature BOUNDARY, and no other features.

For example, let's say we had applied a rule to a simple four-segment word as in the following example:

$rules->apply('MyRule', [$b, $a, $n, $d]);

If MyRule applies rightward and there are no tiers or domains, then the contents of @_ will be as follows on each of the four turns. Boundary segments are indicated by '_B_':

         $_[-2]   $_[-1]   $_[0]   $_[1]   $_[2]   $_[3]

turn 1    _B_      _B_      $b      $a      $n      $d
turn 2    _B_      $b       $a      $n      $d      _B_
turn 3    $b       $a       $n      $d      _B_     _B_
turn 4    $a       $n       $d      _B_     _B_     $b

This makes it easy and intuitive to refer to things like 'current segment' and 'preceding segment'. The current segment is $_[0], the preceding one is $_[-1], the following segment is $_[1], etc.

Yes, it's true that if the focus is on the first segment of the word, $_[-3] refers to the last segment of the word. So be careful. Besides, you should rarely, if ever, need to refer to something that far away. If you think you do, then you're probably better off using a tier or filter.

Also, you should know that the boundary segments themselves are impervious to any attempt to alter or delete them. However, there is nothing that prevents you from setting some other segment to be a boundary, which will do very strange and probably undesirable things. Don't say I didn't warn you.

Using our same example, then, we could write a rule that devoices final consonants very easily.

# Create the rule with two simple code references
$final = sub { $_[1]->BOUNDARY };
$devoice = sub { $_[0]->delink('voice') };
$rules->add_rule(FinalDevoicing => { where => $final,
                                     do    => $devoice });

@word = ($b, $a, $n, $d);
$rules->FinalDevoicing(\@word);
print $symbols->spell(@word); # Prints 'bant'

It is recommended that you follow the intent of the design, and only use the 'where' property to check conditions, and use the 'do' property to actually affect changes. We have no way of enforcing this, however.

Note that, since the code in 'where' and 'do' simply operates on a local subset of the segments that you provided as the word, doing something like delete($_[0]) doesn't really have any effect. Yes, the local reference to the segment at $_[0] is deleted, but the segment still exists outside of the subroutine. Instead, write $_[0]->clear, which removes all feature settings from the segment. Lingua::Phonology::Rules will later clear out any segments that have no features on them for you.

As a corollary, if you give segments that have no feature values set as input, they will be silently dropped from the output.

Using domains

Domains change the segments that are visible to your rules by splitting the word given into parts.

The value for a domain is the name of a feature. If the domain property is specified for a rule, the input word given to the rule will be broken into groups of segments whose value for that feature are references to the same value. For the execution of the rule, those groups of segments act as complete words with their own boundaries. For example:

	@word = $symbols->segment('b','a','r','d','a','m');

    # We make two groups of segments whose SYLLABLE features are all references
    # to the same value
    #
    # Syllable 1
	$word[0]->SYLLABLE(1);
	$word[1]->SYLLABLE($word[0]->value_ref('SYLLABLE'));
	$word[2]->SYLLABLE($word[0]->value_ref('SYLLABLE'));

	# Syllable 2
	$word[3]->SYLLABLE(1);
	$word[4]->SYLLABLE($word[3]->value_ref('SYLLABLE'));
	$word[5]->SYLLABLE($word[3]->value_ref('SYLLABLE'));

	# The preceding can be done a lot easier with the Syllable module.

	# Now we make a rule to drop the last consonant in any syllable
	$rules->add_rule(
		'Drop Final C' => {
			domain => 'SYLLABLE',
		    where => sub { $_[1]->BOUNDARY },
			do => sub { $_[0]->clear }
		}
	);
	
	$rules->apply('Drop Final C', \@word);
	# Now both the /r/ and the /m/ are marked as codas

In this example, if we hadn't specified the domain 'SYLLABLE', only the /m/ would have been marked as a coda, because only the /m/ would have been at a boundary. With the SYLLABLE domain, however, the input word is broken up into the two syllables, which act as their own words with respect to boundaries.

Using tiers

Many linguistic rules behave transparently with respect to some segments or classes of segments. Within the Rules class, this is accomplished by setting the "tier" property of a rule.

The argument given to a tier is the name of a feature. When you specify a tier for a rule and then apply that rule to an array of segments, the rule will only apply to those segments that are defined for that feature. Note that I said 'defined'--binary or scalar features that are set to 0 will still appear on the tier.

This is primarily useful for defining rules that apply across many intervening segments. For example, let's say that you have a vowel harmony rule that applies across any number of intervening consonants. The best solution is to specify that the rule has the tier 'vocoid'. This will cause the rule to completely ignore all non-vocoids: non-vocoids won't even appear in the array that the rule works on. For example:

# Make a rather contrived word
@word = $symbols->segment('b','u','l','k','t','r','i'),

Note that if we were doing this without tiers, we would have to specify $_[5] to see the final /i/ from the /u/. No such nonsense is necessary when using the 'vocoid' tier, because the only segments that the rule "sees" are ('u','i'). Thus, the following rule spreads frontness backwards (though why it does so may not be obvious to non-linguists).

	# Make the rule, being sure to specify the tier
	$rules->add_rule(
		VowelHarmony => {
			tier => 'vocoid',
	        direction => 'rightward',
            # We specify that the last vowel in a word should never change
			where => sub { not $_[1]->BOUNDARY },

            # All vowels before the last copy the front/backness of the vowel
            # after them (front/back position is dominated by the 'Lingual'
            # node, so we just copy the whole node).
			do => sub { $_[0]->Lingual( $_[1]->value_ref('Lingual') ) }
		}
	);
	
	# Apply the rule and print out the result
	$rules->VowelHarmony(\@word);
	print $symbols->spell(@word); # prints 'bylktri'

Tiers include one more bit of magic. When you define a tier, if consecutive segments have references to the same value for that tier, Lingua::Phonology::Rules will combine them into one segment. Once such a segment is constructed, you can assign or test values for the tier feature itself, or any features that are children of the tier (if the tier is a node). Assigning or testing other values will generally fail and return undef, but it may succeed if the return values of the assignment or test are the same for every segment. Be careful.

This (hopefully) makes linguistic sense--if you're using the tier 'SYLLABLE', what you're really interested in are interactions between whole syllables. So that's what you see in your rule: "segments" that are really syllables and include all of the true segments inside them.

When using domains and tiers together, the word is broken up into domains before the tier is applied. Thus, two segments which might otherwise have been combined into a single pseudo-segment on a tier will not be combined if they fall into different domains.

Using filters

Filters are a more flexible, but less magical, way of doing the same thing that a tier does. You define a filter as a code reference, and all of the segments in the input word are put through that code before going on to the rule execution. Your code reference should accept a single Lingua::Phonology::Segment object as an argument and return some sort of truth value that determines whether the segment should be included.

A filter is a little like a tier and a little like a where, so here's how it differs from both of those:

  • Unlike a tier, the filter property is a code reference. That means that your test can be arbitrarily complex, and is not limited to simply testing for whether a property is defined, which is what a tier does. On the other hand, there is no magical combination of segments with a tier.

  • Also, the rule algorithm takes the filter and goes over the whole word with it once, picking out those segments that pass through the filter. It then hands the filtered list of segments to be evaluated by where and do. A where property, on the other hand, is evaluated for each segment in turn, and if the where evaluates to true, the do code is immediately executed.

Filters are primarily useful when you want to only see segments that meet a certain binary or scalar feature value, or when you want to avoid the magical segment-joining of a tier.

Using result

The result parameter is currently EXPERIMENTAL, and depends on the Whatif module, available from CPAN (but not for all architectures). You can do interesting things with it, but it's not yet guaranteed to always do those things.

There are many linguistic processes where it is more accurate or convenient to stipulate a certain result, rather than certain preconditions. The result parameter accomplishes this. You provide a code reference for the result property, and after the do is executed, the result is evaluated. If the result evaluates to true, the change is considered successful and life continues as normal. If the result evaluates to false, the change is "undone", and the word that you're operating on reverts to its previous state. (This undoing is devilishly hard to do by normal means. I tried to implement it without the Whatif module and nearly went crazy.)

Some notes: The result code is only evaluated if the where condition has already been evaluated as true. It is also only evaluated in the immediate context, with the segments in the same order as they were in the most recent where/do. If the result fails, both the code in the do and the result will be rolled back, but not the code in the where.

Using a result condition imposes a mild change on the way that insertion and deletion is handled--but see the next section for that.

Writing insertion and deletion rules

The arguments provided to the coderefs in where and do are in a simple list, which means that it's not really possible to insert and delete segments in the word from the coderef. Segments added or deleted in @_ will disappear once the subroutine exits. Lingua::Phonology::Rules provides a workaround for both of these cases.

Deletion can be accomplished by setting a segment to have no features set. This is easily done with the clear() method for Segment objects. When the coderef for where or do exits, any segments with no values will be automatically deleted. A rule deleting coda consonants can be written thus:

# Assume that we have already assigned coda consonants to have the
# feature 'coda'
$rules->add_rule(
	DeleteCodaC => {
		where => sub { $_[0]->coda },
        do => sub { $_[0]->clear }
	}
);

As a side effect of this, if you provide input segments that have no features set, they will be silently deleted from output.

Insertion can be accomplished using the special methods INSERT_RIGHT() and INSERT_LEFT() on a segment. The argument to INSERT_RIGHT() or INSERT_LEFT() must be a Lingua::Phonology::Segment object, which will be added to the right or the left of the segment on which the method is called. For example, the following rule inserts a schwa to the left of a segment that is unsyllabified (does not have its SYLLABLE feature set):

$rules->add_rule(
	Epenthesize => {
		where => sub { not $_[0]->SYLLABLE },
        do => { $_[0]->INSERT_LEFT($symbols->segment('@')) }
	}
);

Note that the methods INSERT_RIGHT() and INSERT_LEFT() don't exist except inside the code reference for a rule.

That the segments you insert or delete don't immediately (dis)appear. Instead, they wait in segmental limbo until iteration over the current word is complete, and then are inserted/deleted all at once. Exception: when a result is specified, segment deletion/insertion occurs right before the result code is evaluated. This is done purely to accomodate the most likely usage of result: deleting a value and then checking that resulting consonants clusters are still valid.

Developer goodies

Theres a couple of things here that are probably of no use to the average user, but have come in handywhen developing code for other modules or scripts to use. And who knows, you may have a use for them.

All segments have the property _RULE during the execution of a rule. This method returns a hash reference that has keys corresponding to the properties of the currently executing rule. These properties include do, where, domain, tier, direction, etc. If for some reason you need to know or change one of these during the execution of a rule, you can use this to do so. Note that altering the hash reference will alter the actual properties of the current rule--although you won't notice it until the next time the rule is executed.

Here's a silly example:

sub print_direction {
	print $_[0]->_RULE->{direction}, "\n";
}

# Assume that we have $rules and @word lying around
$rules->add_rule(
	PrintLeft => {
		direction => 'leftward',
		do => \&print_direction
	},
	PrintRight => {
		direction => 'rightward',
		do -> \&print_direction
	});

$rules->PrintLeft(\@word);    # Prints 'leftward' several times
$rules->PrintRight(\@word);   # Prints 'rightward' several times

TO DO

The handling of insertion and deletion is very ad-hoc. Better suggestions are welcome.

AUTHOR

Jesse S. Bangs <jaspax@cpan.org>

LICENSE

This module is free software. You can distribute and/or modify it under the same terms as Perl itself.