NAME
Lingua::PT::Conjugate.pm - Module for Conjugating Portuguese verbs
DESCRIPTION
This module contains various routines for conjugating Portuguese verbs.
USAGE
use Lingua::PT::Conjugate;
This module pollutes your namespace with the function conjug
, that takes just the same arguments as the program conjug
(which is just a wrapper).
perl -e 'use Lingua::PT::Conjugate; print conjug("programar","perfeito")'
programar :
perf programei programaste programou programamos programaram
See the conjug
manpage for more examples.
DATA STRUCTURES :
Variable $Portuguese::Conjugate::vlist
The database is a single big string, defined at the end of Conjugate.pm (the comments and end-of-line's are taken out).
It is in the variable $Portuguese::Conjugate::vlist
.
To understand the format, have a look at the code, and read this informal description :
Definition of the conjugation of a verb
<verb_name> ":" <tense>? <conjugated_form> <conjugated_form>...
The <tense> is optional : By default, the "parser" expects to find them in the order pres (i.e. present), perf (perfeito), imp (imperfeito), fut (futuro), mdp (mais-que-perfeito), cpres (conjuntivo presente), cimp (conjuntivo imperfeito), cfut (conjuntivo futuro), cond (condicional), ivo (imperativo), pp (particípio passado) and grd (gerundivo).
The conjugated forms must correspond to 1st person, second, third, first plural, (optionally, second plural) and third plural; except for the imperative tense (shortname : "ivo") which starts at 2nd; and past participle ("pp") and gerundivo ("grd") that have a single form only. The conjugated forms may be regexes that match a correct form (and only a correct form), when more than one form is ok (this is the case of the "verbos abundivos"). The second person plural is omitted, because it is little used. It should be added, eventually.
Shorthand notation may be used :
An "etc" means that the rest of that tense is regular.
A "." means that the current person is regular.
A "x" means that this verb is defective, and that the current
person is missing.
A "acc" means that circumflex and acute accents are toggled for the
current tense.
Saying what verbs follows the model of a given verb
<model_name> = <verb_name> <verb_name> ...
Mean that all the verbs after the "=" have the verb on the lhs as model.
Saying what verbs is defective
defectivos1= <verb_name> ... defectivos2= <verb_name> ... defectivos3= <verb_name> ...
entries specifies defective verbs for which the function conjug has a hard-wired behavior.
%verb
is a hash in the format :
$verb->{$verb_name}->{$tense_name}->[$person] == "whole word".
This is used for irregular verbs. All the $verb->{$verb_name}->{$tense_name}->[$person] and $verb->{$verb_name}->{$tense_name} need not be defined.
If a verb follows a model (i.e. is conjugated like a verb which is
present in %verb's keys, and more completely defined), it has a
"model" key :
$verb->{$verb_name}->{"model"} == "model name".
If a verb is "defectivo" (has missing forms), but otherwise regular,
it will have an entry
$verb->{"defectivo"}->{$verb_name} == number.
where the number (1,2, or 3) defines how it behaves.
%reg
is a hash describing the ending of regular verbs :
$reg->{$verb_ending}->{$tense_name}->[$person] == "ending".
The verb ending may be [aeio]r.
%endg
contains regexes that match verb endings, and is, right now, a little obsolete in the code.
codify($string)
reads a string in the $vlist format, and modifies the internal %verb variable so that it reflect the information if $string
.
&locate($verbname)
If $verbname is not a key of "%verb", finds the data relevant to that verb in the string $vlist , and codes it so that %verb reflects that data.
ALGORITHM of the conjug
sub :
Pick up options and arguments.
FOREACH verb $v,
Find the $root and ending ($edg).
Check if $v will require a special treatment, e.g. if it ends in
/g[ei]r$/, /c[ei]r$/, etc... and if it is the case, define a
"modifier" function (called &$modif), that will be called to
"final-polish" the output.
IF $v is IRREGULAR, e.g. if %verb{$v} is defined.
IF $v follows a model, find that model, and "see what it looks
like" (e.g. does the model's root ($rm) end with consonnants,
etc).
END
FOREACH tense $t and person $p ($b is in 1-4,6),
IF this form (e.g. $verb->{$v}->{$t}->[$p-1]) is explicitely
defined,
Great! That's it.
ELSE IF the model is explicitly defined, for this form,
"See how the model relates to the form, and how the verb $v
relates to the model, and from that find what should be $v's
corresponding form" (quite heuristic).
ELSE
Find the appropriate form for regular-irregular verbs (another
series of heuristics).
END
Check for "defective" verbs.
END
ELSE (verb is REGULAR)
It's trivial.
END
END That's it!
The actual code is much, much hairier.
WHAT MAKES ME BELIEVE IT WORKS?
The file reference
reference contains conjugation tables for plenty of verbs. Each time I modify Conjugate.pm
, I run the program ckconj
("ckconj all") that checks that the output is still correct for all these verbs.
This program outputs two lines, one starting with "OK", containing the verbs that checked ok, and a line "NOT IN REFERENCE FILE" containing the verbs that "Conjugate.pm" knows about, but that are not in the "reference" file.
Any other output reports found errors.
Since reference
is produced by conjug
, it may be inadvertely contaminated. When I discover that, I hack Conjugate.pm
until it conjugates correctly the faulty verb (by checking myself), *and all the other ones too* (with "ckconj all")(like that, I check that my fix has not broken anything). And then I run the program assert (syntax is "assert faulty_verb"), which modifies the reference table.
SEE ALSO : treinar, conjug.
VERSION 0.01
AUTHOR Etienne Grossmann, Jan 27th 1998 [etienne@isr.ist.utl.pt]
CREDITS
Thanks to all people on usenet and here at ISR with whom I have discussed about this module, who have provided advice on conjugation, programming, on naming and on all relevant points.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 47:
Non-ASCII character seen before =encoding in '(particípio'. Assuming CP1252