NAME
Lingua::Interset::Tagset::Multext - Common code for drivers of tagsets of the Multext-EAST project.
VERSION
version 3.016
SYNOPSIS
package Lingua::Interset::Tagset::HR::Multext;
extends 'Lingua::Interset::Tagset::Multext';
# We must redefine the method that returns tagset identification, used by the
# decode() method for the 'tagset' feature.
sub get_tagset_id
{
# It should correspond to the last two parts in package name, lowercased.
# Specifically, it should be the ISO 639-2 language code, followed by '::multext'.
return 'hr::multext';
}
# We may add or redefine atoms for individual surface features.
sub _create_atoms
{
my $self = shift;
# Most atoms can be inherited but some have to be redefined.
my $atoms = $self->SUPER::_create_atoms();
$atoms->{verbform} = $self->create_atom (...);
return $atoms;
}
# We must define the lists of surface features for all surface parts of speech!
sub _create_feature_map
{
my $self = shift;
my %features =
(
'N' => ['pos', 'nountype', 'gender', 'number', 'case', 'animacy'],
...
);
return \%features;
}
# We must define the list() method.
sub list
{
my $self = shift;
my $list = <<end_of_list
Ncmsn
Ncmsg
Ncmsd
...
end_of_list
;
my @list = split(/\r?\n/, $list);
return \@list;
}
DESCRIPTION
Common code for drivers of tagsets of the Multext-EAST project. All the Multext-EAST tagsets use the same inventory of parts of speech and the same inventory of features (but not all features are used in all languages). Feature values are individual alphanumeric characters and they are also unified, thus if a feature value appears in several languages, it is always encoded by the same character. The tagsets are positional, i.e. the position of the value character in the tag determines the feature whose value this is. The interpretation of the positions is defined separately for every language and for every part of speech. Empty value (for unknown or irrelevant features) is either encoded by a dash ("-"; if at least one of the following features has a non-empty value) or is just omitted (at the end of the tag).
SEE ALSO
Lingua::Interset, Lingua::Interset::Tagset, Lingua::Interset::Tagset::CS::Multext, Lingua::Interset::Tagset::HR::Multext, Lingua::Interset::FeatureStructure
AUTHOR
Dan Zeman <zeman@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
This software is copyright (c) 2019 by Univerzita Karlova (Charles University).
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.