NAME
Lingua::Interset::FeatureStructure - The main class of DZ Interset 2.0.
VERSION
version 2.002
SYNOPSIS
use Lingua::Interset::FeatureStructure;
print(Lingua::Interset::FeatureStructure->known_features(), "\n");
DESCRIPTION
DZ Interset is a universal framework for reading, writing, converting and interpreting part-of-speech and morphosyntactic tags from multiple tagsets of many different natural languages.
The FeatureStructure
class defines all morphosyntactic features and their values used in DZ Interset. An object of this class represents a morphosyntactic tag for a natural language word.
More information is given at the DZ Interset project page, https://wiki.ufal.ms.mff.cuni.cz/user:zeman:interset:features.
METHODS
is_noun()
is_adjective()
is_pronoun()
is_numeral()
is_verb()
is_adverb()
is_adposition()
is_conjunction()
is_coordinator()
is_subordinator()
is_particle()
is_interjection()
is_punctuation()
is_foreign()
is_typo()
set()
A generic setter for any feature. These two statements do the same thing:
$fs->set ('pos', 'noun');
$fs->set_pos ('noun');
If you want to set multiple values of a feature, there are several ways to do it:
$fs->set ('tense', ['pres', 'fut']);
$fs->set ('tense', 'pres', 'fut');
$fs->set ('tense', 'pres|fut');
All of the above mean that the word is either in present or in future tense.
Note that the 'other' feature behaves differently. Its value can be structured, set()
will keep the structure and will not try to interpret it.
multiset()
$fs->multiset ('pos' => 'conj', 'conjtype' => 'coor');
Sets several features at once. Takes a list of value assignments, i.e. an array of an even number of elements (feature1, value1, feature2, value2, ...) This is useful when defining decoders from physical tagsets. Typically, one wants to define a table of assignments for each part of speech or input feature:
'CC' => ['pos' => 'conj', 'conjtype' => 'coor']
set_hash()
my %hash = ('pos' => 'noun', 'number' => 'plu');
$fs->set_hash (\%hash);
Takes a reference to a hash of features and their values. Sets the values of the features in this FeatureStructure
. Unknown features are ignored. Known features that are not set in the hash will be (re-)set to empty values.
get()
A generic getter for any feature. These two statements do the same thing:
$pos = $fs->get ('pos');
$pos = $fs->pos();
Be warned that you can get an array reference if the feature has multiple values. It is probably better to use one of the alternative get...()
functions where it is better defined what you can get.
get_joined()
Similar to get()
but always returns scalar. If there is an array of disjoint values, it sorts them alphabetically and joins them using the vertical bar. Example: 'fem|masc'
. The sorting makes comparisons easier; it is assumed that the actual ordering is not significant and that 'fem|masc'
is identical to 'masc|fem'
.
get_list()
Similar to get but always returns list of values. If there is an array of disjoint values, this is the list. If there is a single value (empty or not), this value will be the only member of the list.
Unlike in get_joined()
, this method does not sort the list before returning it.
get_hash()
my $hashref = $fs->get_hash();
Creates a hash of all features and their values. Returns a reference to the hash.
as_string()
Generates a textual representation of the feature structure so it can be printed.
enforce_permitted_values()
$fs->enforce_permitted_values ($permitted_trie);
Makes sure that a feature structure complies with the permitted combinations recorded in a trie. Takes a Lingua::Interset::Trie object as a parameter. Replaces feature values if needed. (Note that even the empty value may or may not be permitted.)
duplicate()
Returns a new Lingua::Interset::FeatureStructure
object that is a duplicate of the current structure. Makes sure that a deep copy is constructed if there are any complex feature values.
FUNCTIONS
known_features()
Returns the list of known feature names in print order.
priority_features()
Returns the list of known features ordered according to their default priority. The priority is used in Lingua::Interset::Trie when one looks for the closest matching permitted structure.
known_values()
Returns the list of known values of a feature, in print order. Dies if asked about an unknown feature.
feature_valid()
Takes a string and returns a nonzero value if the string is a name of a known feature.
value_valid()
Takes two scalars, $feature
and $value
. Tells whether they are a valid (known) pair of feature name and value. A reference to a list of valid values is also a valid value. This function does not die when the feature is not valid.
structure_to_string()
Recursively converts a structure to a string. The string uses Perl syntax for constant structures, so it can be used in eval.
get_replacements()
my $replacements = Lingua::Interset::FeatureStructure->get_replacements();
my $rep_adverb = $replacements->{pos}{adverb};
foreach my $r (@{$rep_adverb})
{
if(...)
{
# This replacement matches our constraints, let's use it.
return $r;
}
}
Returns the set of replacement values for the case a feature value is not permitted in a given context. It is a hash{feature}{value0}, leading to a list of values that can be used to replace the value0, ordered by priority.
iseq()
if (Lingua::Interset::FeatureStructure->iseq ($a, $b)) { ... }
Compares two values, scalars or arrays, whether they are equal or not. Takes two parameters. Each of them can be a scalar or an array reference.
array_to_scalar_value()
Converts array values to scalars. Sorts the array and combines all elements in one string, using the vertical bar as delimiter. Does not care about occurrences of vertical bars inside the elements (there should be none anyway).
Takes an array reference as parameter. If the parameter turns out to be a plain scalar, the function just returns it.
AUTHOR
Dan Zeman <zeman@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Univerzita Karlova v Praze (Charles University in Prague).
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.