NAME

Lingua::SoundChange - Create regular sound changes

SYNOPSIS

use Lingua::SoundChange;

my $lat2port = Lingua::SoundChange->new($variables, $rules);
# or
my $lat2port = Lingua::SoundChange->new($variables, $rules, $options);

my $translation = $lat2port->change($original);

DESCRIPTION

Introduction

This module is a sound change applier. With it, you can construct objects which will generate consistent sound changes. One way to use this is, for example, to simulate the sound change from one language to another (such as from Latin to Portuguese). It was inspired by Mark Rosenfelder's sound change applier program; see "SEE ALSO" for more information and a URL.

This module has an object-oriented interface. To use it, construct a Lingua::SoundChange object, which you can then use to apply sound changes to words. You can also have several different sound change objects around simultaneously, for example, to show sound change from a parent language to several different daughter languages, each with different sound change rules.

Methods

new(HASHREF, ARRAYREF [, HASHREF])

The constructor new creates a new Lingua::SoundChange object. It takes two or three parameters: a hash ref, an array ref, and another (optional) hash ref.

variables

The first parameter is a hash ref listing zero or more "variables". These are one-character short cuts for character classes. For example, you could define S to be any stop, or F to be any front vowel. These are useful in the ruleset, described below. If you do not wish to use any variables, pass in a reference to an empty hash as the first parameter of the constructor.

Variables are often given capital letters to distinguish them from the "data" letters used in the rules, which are usually lowercase. This is not a requirement; however, note that if you have a source letter with the same name as a variable, the behaviour is undefined.

The keys of this hash ref are the names of the variables; the values are a string of letters which make up the variable. This is similar to a character range in Perl's regular expression (e.g. [aeiou] for a vowel); however, you should not include the brackets in the value.

For example, to make V a list of voiced consonants and U a list of corresponding unvoiced consonants, you could pass something like this to new:

{ V => 'ptk', U => 'bdg' }
rules

The second parameter is an array ref listing zero or more "rules". These rules describe which sound changes to apply in which environments. The sound changes will be applied in the order in which these rules are presented.

For more information on the format of these rules, see "Format of sound change rules".

NOTE: Do not use characters in the rules or variable names which are special to regular expressions. This includes the following characters: . * + ? [ ] { } ( ). (Exception: the use of parentheses to mark something as optional in an environment.)

options

The third, optional, parameter is a hash ref of options which control what data is output or in which format the translated words are returned. Each key in the hash takes a Boolean value (true or false).

Possible options are:

printRules

Whether to print out (to STDOUT) which rule applies to each word, and at which character position, during matching.

The output will look like this:

s-> /_# applies to secundus at 7 

This will use $`, which will incur a slight time penalty for all regular expressions in your script.

Default: false.

keep

If this option is set to a true value, then the list of returned items will be a list of array refs, each containing two elements: first the original word as passed in to the change method, and second the (possibly transformed) word. Otherwise, the result list will contain only the (possibly transformed) word.

Default: false.

The constructor returns a new Lingua::SoundChange object on success. On failure, the constructor will croak.

change(ARRAYREF)

Once you have constructed a Lingua::SoundChange object, you can use it to apply the sound changes you have described on words.

Pass in an array ref with one word per array element. The sound changes specified in the constructor will be applied to each word in turn. The result will be an arrayref containing the transformed words.

Note that this method does not do any splitting of text into words for you; this is left up to you. The reason for this is that the concept of a word is left up to the user of the module. A simple case would be "a sequence of \w characters" or "a sequence of non-space characters".

EXPORT

None.

This module only has an object-oriented interface and does not export anything.

LONG EXPLANATION

The following explanation is largely taken from Mark Rosenfelder's own description of his sound change applier program sounds, and modified as appropriate for this module. The I in the following narrative is Mark's, not mine.

Basic operation

Lingua::SoundChange takes words as input, applies a set of sound changes described in variables and rules, and returns a set of modified words.

For instance, Lingua::SoundChange will take the input data, variables, and rules on the left and produce the output on the right:

Input         Variables               Output

lector        V => 'aeiou'            leitor
doctor        C => 'ptcqbdgmnlrhs'    doutor
focus         F => 'ie'               fogo
jocus         B => 'ou'               jogo
districtus    S => 'ptc'              distrito
civitatem     Z => 'bdg'              cidade
adoptare                              adotar
opera         Rules                   obra
secundus                              segundo
              s//_#
              m//_#
              e//Vr_#
              v//V_V
              u/o/_#
              gn/nh/_
              S/Z/V_V
              c/i/F_t
              c/u/B_t
              p//V_t
              ii/i/_
              e//C_rV

Format of sound change rules

Hopefully, the format of the rules will be familiar to any linguist. For instance, here's one sound change:

c/g/V_V

This rule says to change c to g between vowels. (We'll see how to generalize this rule below.)

More generally, a sound change looks like this:

x/y/z

where x is the thing to be changed, y is what it changes to, and z is the environment.

The z part must always contain an underline _, representing the part that changes. That can be all there is, as in

gn/nh/_

which tells the module to replace gn with nh unconditionally.

The character # represents the beginning or end of the word. So

u/o/_#

means to replace u with o, but only at the end of the word.

The middle (y) part can be blank, as in

s//_#

This means that s is deleted when it ends a word.

Variables

The evironment (the z part) can contain variables, like V above. These are defined in the first parameter to the constructor. I use capital letters for this, though this is not a requirement. Variables can only be one character long. You can defined any variables needed to state your sound changed. E.g. you could define S to be any stop, or K for any coronal, or whatever.

So the variable definition and rule

F => 'ie'

c/i/F_t

means that c changes to i after a front vowel and before a t.

You can use variables in the first two parts as well. For instance, suppose you've defined

S => 'ptc',
Z => 'bdg'

S/Z/V_V

This means that the stops ptc change to their voiced equivalents bdg between vowels. In this usage, the variables must correspond one for one--p goes to b, t goes to d, etc. Each character in the replacement variable (here Z) gives the transformed value of each character in the input variable (here S). Make sure the two variable definitions are the same length!

A variable can also be set to a fixed value, or deleted. E.g.

Z//V_V

says to delete voiced stops between vowels, and

Z/?/V_V

would translate all voiced stops between vowels to a glottal stop ?.

Rule order

Rules apply in the order they're listed. So, with the word opera and the rules

p/b/V_V
e//C_rV

the first rule voices the p, resulting in obera; the second deletes an e between a consonant and an intervocalic r, resulting in obra.

The printRules option can assist in debugging rules, since it causes the output to show exactly what rules applied to each word.

Optional elements in the environment

One or more elements in the environment can be marked as optional with parentheses. E.g.

u/ü/_C(C)F

says to change u to ü when it's followed by one or two consonants and then a front vowel.

How to use it

The module is simple-minded and yet powerful... in fact it's powerful in part because it's simple-minded. You can do a lot with these basic pieces.

Input orthography

For instance, you may wonder whether the input data should be based on spellings or phonemes. It doesn't matter: the program applies its changes to whatever you give it. In my example I used conventional spellings, but I could just as easily have used a phonemic rendering. Similarly, I wrote the rules to output orthographic Portuguese, simply to make for an easy example. It would be better to output a phonetic representation. This would help us realize that we really need a sound change

k/s/_F

that would handle the change from civitatem with /k/ to cidade with /s/.

The module will handle whatever you put into it, including accented characters. If the language you're working with requires a special font, simply edit the source and output data with an editor, using that font. This would allow you to use (say) an IPA font.

To improve my Latin-to-Portuguese rules, for instance, I would certainly want to handle vowel length and stress. I might use accented vowels for this. Of course the program knows nothing about phonetics, so you have to remember to define the variables to match how you've set up the input data. If you use accented vowels, you will want to change the definition of V.

Using digraphs

Though sound changes can refer to digraphs, variables can't include them. So, for instance, the following rule is intended to delete an i onset following an intervocalic consonant:

i//VC_V

However, it won'f affect (say) achior, because the C will not match the digraph ch. You could write extra rules to handle the digraphs; but it's often more convenient to use an orthography where every phoneme corresponds to a single character.

You can write transformation rules at the beginning of your sound change rules to transform digraphs in the input data:

ph/f/_

Using Lingua::SoundChange for conlang development

To create a child language from a parent, create some input data containing the vocabulary of the parent, then a list of variables and rules containing the sound changes you want to apply. Now use Lingua::SoundChange to generate the child language's vocabulary.

For example, you can download a vocabulary of Methaiun (ftp://ftp.enteract.com/users/markrose/metaiun.lex) and the sound changes for Kebreni (ftp://ftp.enteract.com/users/markrose/kebreni.sc). You can compare this to the Kebreni grammar (http://www.zompist.com/kebreni.htm) in Virtual Verduria (http://www.zompist.com/virtuver.htm).

For me, there is a peculiar, intense pleasure in creating a daughter language with a particular feel to it, merely by altering the set of sound changes. All I can think of to compare it to is creating new animals indirectly, by mutating their DNA.

What sort of sound changes should you use? You can examine the history of any language family for ideas. Some common changes that can form part of your repertoire (with some sample Lingua::SoundChange rules):

Lenition

Stops become frivatives; unvoiced consonants become voiced; stops erode into glottal stops, or h, or disappear. The intervocalic position is especially prone to change.

S/Z/V_V
Palatalization

Consonants can palatalize before or after a front vowel i e, perhaps ending up as an affricate or fricative.

k/ç/_F
Monophthongization.

Diphthongs tend to simplify. This rule is fun to apply after letting the vanished sounds affece adjoining consonants.

i//CV_C
Assimilation

Consonants change to match the place or type of articulation of an adjoining consonant.

D => 'td'

m/n/_D
Nasalization

A nasal consonant can disappear, after nasalizing the previous vowel.

'Â' => 'âêîôû',
N => 'mn'

V/Â/_N
N//Â_
Umlaut

A vowel changes to match the rounding of the next vowel in the word.

u/ü/_C(C)i
Vowel shifts

One vowel can migrate into a free area of the vowel space, perhaps dragging others behind it.

a/&/_
o/a/_
u/o/_
Tonogenesis

One way tones can originate is for voiced consonants to induce the next vowel to be pronounced in a low pitch.

Z => 'bdgzvmnlr',
V => 'aiu',
L => 'áíú'

V/L/Z_
Loss of unstressed syllables
A => 'áéíóú'

V//AC(C)_
Loss of final sounds

This can really mess up your carefully worked out inflectional system.

V//_#

The beauty part of using Lingua::SoundChange is that your language will illustrate the Neo-Grammarian principle: sound changes apply uniformly whenever their conditions are met. You may choose to edit the results by hand, however, to simulate the complications of real languages. Analogy can regularize the grammar; words may be borrowed from another dialect where different changes applied; words may be reborrowed from the parent language by scholars.

I pay particular attention to the havoc the sound changes are likely to wreak on the inflectional system. E.g. if a case distinction is maintained in some words and lost in others, it may spread to the second category by analogy.

Sound changes can also result in homonyms. For instance, if you voice intervocalic consonants, meta and meda will merge. You can simply live with this, but if the merger is particularly awkward, the users of the language are likely to invent a new word to replace one of the homonyms. E.g. Latin American Spanish has innovated cocinar "to cook", since the original cocer has merged with coser "to sew".

Using Lingua::SoundChange to find spelling rules

I've also used sounds to model the spelling rules of English. Here the input file lists the spellings of several thousand English words, and the "sound changes" are rules for turning those spellings into a phonetic representation of how the words sound.

Most people think English spelling hopeless; but in fact the rules predict the correct pronunciation of the word 60% of the time, and make only minor errors (e.g. insufficient vowel reduction) another 35% of the time.

A discussion of the rules, including the input and output files, is at http://www.zompist.com/spell.html .

DIFFERENCE

This section lists the differences between Mark Rosenfelder's sounds program and Lingua::SoundChange, and how to convert from sounds input and instructions to Lingua::SoundChange.

Form of input

sounds takes two input files (xxx.lex and yyy.sc) and produces output on standard output (unless the -f option is given) and to a file yyy.out. xxx.lex is the lexicon of the input language, and yyy.sc contains the variables and sound changes and possibly comments.

Lingua::SoundChange splits these two up; the sound change file yyy.sc is passed to the constructor new while the lexicon xxx.lex is passed to change. Also, variables and rules are passed to new separately.

Variables and rules

yyy.sc, the sound change file accepted by sounds, may contain a mixture of variables (which must precede all rules), rules, and comments. Comments are marked by an asterisk * at the beginning of the line.

Lingua::SoundChange requires these two to be split up, and does not accept comments explicitly. However, if the list of sound changes is inside a Perl script, Perl comments can, of course, be used.

Converting a sound change file on-the-fly

Here's a simple way to convert a yyy.sc file on-the-fly into something which is suitable as input to new.

my(%vars, @rules);
open SC, '<port.sc' or die "Can't open port.sc: $!";
while(<SC>) {
  next if /^\*/;    # skip comment line
  next unless /\S/; # skip blank lines;
  chomp;
  if(/^(.)=(.+)$/) {
    $vars{$1} = $2;
  } elsif(m{^[^/]+/[^/]*/.+$}) {
    push @rules, $_;
  }
}

Specifying variables and rules in-line

If you specify variables and rules inside your script, rather than reading them in from some external source, you can use Perl comments in appropriate places if you wish. For example, you could translate

* Vowels
V=aeiou
* Consonants
C=bcdfghjklmnpqrstvwxyz

to

{
  # Vowels
  V => 'aeiou',
  # Consonants
  C => 'bcdfghjklmnpqrstvwxyz',
}

and

* Lenition
S/Z/V_V
* Palatalization
k/ç/_F

to

[
  # Lenition
  'S/Z/V_V',
  # Palatalization
  'k/ç/_F',
]

.

Splitting up words

sounds assumes that xxx.lex will contain one word per line. It does not attempt to split words according to any rules; everything in one line is treated as one word. Therefore, converting a sounds .lex file to input for Lingua::SoundChange is simple; it could be done like this, for example:

open LEX, '<latin.lex' or die "Can't read latin.lex: $!";
my @words = <LEX>;
chomp(@words);

Now \@words can be passed in to change as a list of words to transform.

Format of output

sounds outputs results like this:

lector --> leitor

(or like this:

leitor [lector]

if the -b switch was passed. Lingua::SoundChange normally outputs nothing, instead returning simply 'leitor' or (if the keep option was specified, [ 'lector', 'leitor' ]). It is up to the caller to format the output if this is desired.

Command-line switches

sounds takes several command-line switches:

-p

This tells sounds to print out which rules apply to each word. Use the printRules option in Lingua::SoundChange for this.

-b

This causes sounds to print the original word in brackets behind the changed word, rather than before the changed word and an arrow.

This switch is not supported directly by Lingua::SoundChange; format the output as you desire.

-l

This switch causes sounds to omit the original word from the output, leaving only transformed words. In effect, Lingua::SoundChange behaves as if this is always on, unless you specify the keep option.

-f

This switch causes sounds to write its output only to yyy.out and not also to the screen.

This switch is not supported directly by Lingua::SoundChange, since it doesn't output anything either to a file or to the screen (unless the printRules option is specified); instead, it returns the transformed words from change.

SEE ALSO

This module was inspired by Mark Rosenfelder's sound change applier, documented at http://www.zompist.com/sounds.html , and by the sample code he provides there. The interface is slightly similar.

AUTHOR

Philip Newton, <pne@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2001 Philip Newton. Based on original code, copyright (C) 2001 Mark Rosenfelder.

This software, along with its associated documentation and example files, may be freely used, distributed, and modified, for non-commercial purposes only, provided that the above copyright notice and this permission notice are included in all copies or substantial portions of the software.

To request a licence for commercial use of software based on Mark Rosenfelder's sounds.c code, write to him at markrose@zompist.com.

NOTE

Please note the restriction on non-commercial use. Selling CPAN CDs, for example, is fine as long as the cost is nominal, but using this code to make money is not allowed.

This restriction may be removed in the future if the code is modified so as not to be based on Mark's code any longer. (Most of it is original anyway simply because

  • Perl lends itself to a different approach than C, and

  • all the code for reading and parsing config files is basically not here.

.)

1 POD Error

The following errors were encountered while parsing the POD:

Around line 539:

Non-ASCII character seen before =encoding in 'u/ü/_C(C)F'. Assuming CP1252