NAME

Lingua::PT::Conjugate.pm - Module for Conjugating Portuguese verbs

DESCRIPTION

This module contains various routines for conjugating Portuguese verbs.

USAGE

use Lingua::PT::Conjugate;

This module pollutes your namespace with the function C<conjug>,
that takes just the same arguments as the program C<conjug> (which is
just a wrapper). For examples, see the C<conjug> manpage.

DATA STRUCTURES :

Filehandle __END__, and variable $Portuguese::Conjugate::vl

The database in verb.pt is loaded as a single big string, $Portuguese::Conjugate::vl, uncommented and made into a single line.

Informally its syntax is :

Definition of the conjugation of a verb

<verb_name> ":" <tense>? <conjugated_form> <conjugated_form>...

The <tense> is optional : By default, the "parser" expects to find them in the order pres,perf,imp,fut,mdp,cpres,cimp,cfut,cond,ivo,pp, grd.

The conjugated forms must correspond to 1st person, second, third, first plural, (optionally, second plural) and third plural; except for the imperative tense (shortname : "ivo") which starts at 2nd; and past participle ("pp") and gerundivo ("grd") that have a single form only. The conjugated forms may be regexes that match a correct form (and only a correct form) when more than one form is ok (this is the case of the "verbos abundivos"). The second person plural is omitted, because it is little used. It should be added, eventually.

Shorthand notation may be use :

An "etc" means that the rest of that tense is regular. A "." means that the current person is regular. A "x" means that this verb is defective, and that the current person is missing. A "acc" means that circumflex and acute accents are toggled for the current tense.

Saying what verbs follows the model of a given verb

<model_name> = <verb_name> <verb_name> ...

Mean that all the verbs after the "=" have the verb on the lhs as model.

Saying what verbs is defective

defectivos1= <verb_name> ... defectivos2= <verb_name> ... defectivos3= <verb_name> ...

entries specifies defective verbs for which the function conjug has a hard-wired behavior.

%verb is a hash in the format :

$verb->{$verb_name}->{$tense_name}->[$person] == "whole word".

This is used for irregular verbs. All the $verb->{$verb_name}->{$tense_name}->[$person] and $verb->{$verb_name}->{$tense_name} need not be defined.

If a verb follows a model (i.e. is conjugated like a verb which is
present in %verb's keys, and more completely defined), it has a
"model" key :

$verb->{$verb_name}->{"model"} == "model name".

If a verb is "defectivo" (has missing forms), but otherwise regular,
it will have an entry 

$verb->{"defectivo"}->{$verb_name} == number.

where the number (1,2, or 3) defines how it behaves.

%reg is a hash describing the ending of regular verbs :

$reg->{$verb_ending}->{$tense_name}->[$person] == "ending".

The verb ending may be [aeio]r.

%endg contains regexes that match verb endings, and is, right now, a little obsolete in the code.

codify($string) reads a string in the verb.pt format, and modifies the internal %verb variable.

&locate($verbname) If $verbname is not a key of "%verb", finds the data relevant to that verb in the string $vlist , and codes so that %verb reflects that data.

ALGORITHM of the conjug sub :

Pick up options and arguments.


FOREACH verb $v,

  Find the $root and ending ($edg).

  Check if $v will require a special treatment, e.g. if it ends in
    /g[ei]r$/, /c[ei]r$/, etc... and if it is the case, define a
    "modifier" function (called &$modif), that will be called to
    "final-polish" the output.

  IF $v is IRREGULAR, e.g. if %verb{$v} is defined.

    IF $v follows a model, find that model, and "see what it looks
      like" (e.g. does the model's root ($rm) end with consonnants,
      etc). 
    END


    FOREACH tense $t and person $p ($b is in 1-4,6),
    
      IF this form (e.g. $verb->{$v}->{$t}->[$p-1]) is explicitely
        defined,  

        Great! That's it.

      ELSE IF the model is explicitly defined, for this form,
  
        "See how the model relates to the form, and how the verb $v
         relates to the model, and from that find what should be $v's
         corresponding form" (quite heuristic).

      ELSE 

        Find the appropriate form for regular-irregular verbs (another
        series of heuristics).

      END
    
      Check for "defective" verbs.

    END

  ELSE (verb is REGULAR)
    
      It's trivial. 

  END
END  That's it!

The actual code is much, much hairier.

WHAT MAKES ME BELIEVE IT WORKS?

The file C<reference> reference contains conjugation tables for
plenty of verbs. Each time I modify C<Verb.pm>, I run the program
C<ckconj> ("ckconj all") that checks that the output is still
correct for all these verbs. 

This program outputs two lines, one starting with "OK", containing
the verbs that checked ok, and a line "NOT IN REFERENCE FILE"
containing the verbs that "Verb.pm" knows about, but that are not in
the "reference" file.

Any other output reports found errors.


Since F<reference> is produced by L<conjug>, it may be inadvertely
contaminated. When I discover that, I hack F<Verb.pm> until it
conjugates correctly the faulty verb (by checking myself), *and all
the other ones too* (with "ckconj all")(like that, I check that my fix
has not broken anything). And then I run the program F<assert> (syntax
is "assert faulty_verb"), which modifies the reference table.

SEE ALSO : treinar, conjug.

VERSION 0.01

AUTHOR Etienne Grossmann, Jan 27th 1998 [etienne@isr.ist.utl.pt]

CREDITS

Thanks to all people on usenet and here at ISR with whom I have
discussed about this module, who have provided advice on conjugation,
programming, on naming and on all relevant points.