NAME

Treex::Core::Phrase

VERSION

version 2.20201228

DESCRIPTION

A Phrase is a concept defined on top of dependency trees and subtrees (where a subtree contains a node and all its descendants, not just any arbitrary subset of nodes). Similarly to the Chomsky's hierarchy of formal grammars, there are two main types of phrases: terminal and nonterminal. Furthermore, there may be subtypes of the nonterminal type with special behavior.

A terminal phrase contains just one Node (which typically corresponds to a surface token).

A nonterminal phrase does not directly contain any Node but it contains one or more (usually at least two) sub-phrases. The hierarchy of phrases and their sub-phrases is also a tree structure. In the typical case there is a relation between the tree of phrases and the underlying dependency tree, but the rules governing this relation are not fixed.

Phrases help us model situations that are difficult to model in the dependency tree alone. We can encode multiple levels of “tightness” of relations between governors and dependents. In particular we can distinguish between dependents that modify the whole phrase (shared modifiers) and those that modify only the head of the phrase (private modifiers).

This is particularly useful for various tree transformations and conversions between annotation styles (such as in the HamleDT blocks). The idea is that we will first construct a phrase tree based on the existing dependency tree, then we will perform transformations on the phrase tree and finally we will create new dependency relations based on the phrase tree and on the rules defined by the desired annotation style. Phrase is a temporary internal structure that will not be saved in the Treex format on the disk.

Every phrase knows its parent (superphrase) and, if it is nonterminal, its children (subphrases). It also knows which of the children is the head (as long as there are children, there is always one and only one head child). The phrase can also return its head node. For terminal phrases, this is the node they enwrap. For nonterminal phrases, this is defined recursively as the head node of their head child phrase.

Every phrase also has a dependency relation label (deprel). These labels are analogous to deprels of nodes in dependency trees. Most of them are just taken from the underlying dependency tree and they are propagated back when new dependency structure is shaped after the phrases; however, some labels may have special meaning even for the Phrase objects. They help recognize special types of nonterminal phrases, such as coordinations. If the phrase is the head of its parent phrase, its deprel is identical to the deprel of its parent. Otherwise, the deprel represents the dependency relation between the phrase and the head of its parent.

ATTRIBUTES

parent

Refers to the parent Phrase, if any.

is_member

Is this phrase member of a paratactic structure such as coordination (where members are known as conjuncts) or apposition? We need this attribute because of the Prague-style dependency trees. We need it only during the building phase of the phrase tree.

We could encode this attribute in deprel but it would not be practical because it acts independently of deprel. Unlike deprel, is_member is less tied to the underlying nodes; it is really an attribute of the whole phrase. If we decide to change the deprel of the phrase (which is propagated to selected core children), we do not necessarily want to change is_member too. And we do not want to decode is_member from deprel, shuffle and encode elsewhere again.

When a terminal phrase is created around a Node, it takes its is_member value from the node. When the phrase receives a parent, the is_member flag will be typically moved to the parent (and erased at the child). However, this does not happen automatically and the Builder has to do that when desired. Similarly, when the type of the phrase is changed (e.g. a new Phrase::PP is created, the contents of the old Phrase::NTerm is moved to it and the old phrase is destroyed), the surrounding code should make sure that the is_member flag is carried over, too. Finally, the value will be used when a Phrase::Coordination is recognized. At that point the is_member flag can be erased for all newly identified conjuncts because now they can be recognized without the flag. However, if the Phrase::Coordination itself (or its Phrase::NTerm predecessor) is a member of a larger paratactic structure, then it must keep the flag for its parent to see and use.

METHODS

$phrase->set_parent ($nonterminal_phrase);

Sets a new parent for this phrase. The parent phrase must be a nonterminal. This phrase will become its new non-head child. The new parent may also be undefined, which means that the current phrase will be disconnected from the phrase structure (but it will keeep its own children, if any). The method returns the old parent.

my @dependents = $phrase->dependents();

Returns the list of dependents of the phrase. This is an abstract method that must be implemented in every derived class. Nonterminal phrases have a list of dependents (possible empty) as their attribute. Terminal phrases return an empty list by definition.

my @children = $phrase->children();

Returns the list of children of the phrase. This is an abstract method that must be implemented in every derived class. Nonterminal phrases distinguish between core children and dependents, and this method should return both. Terminal phrases return an empty list by definition.

if( $phrase->is_descendant_of ($another_phrase) ) {...}

Tests whether this phrase depends on another phrase via the parent links. This method is used to prevent cycles when setting a new parent.

my $ist = $phrase->is_terminal();

Tells whether this phrase is terminal, that is, it does not have children (subphrases).

my $isc = $phrase->is_coordination();

Tells whether this phrase is Treex::Core::Phrase::Coordination or its descendant.

my $iscc = $phrase->is_core_child();

Tells whether this phrase is core child of another phrase. That is sometimes important to know because core children cannot be easily moved around.

my $node = $phrase->node();

Returns the head node of the phrase. For terminal phrases this should just return their node attribute. For nonterminal phrases this should return the node of their head child. This is an abstract method that must be defined in every derived class.

my @nodes = $phrase->nodes();

Returns the list of all nodes covered by the phrase, i.e. the head node of this phrase and of all its descendants.

my @phrases = $phrase->terminals();

Returns the list of all terminal descendants of this phrase. Similar to nodes(), but instead of Node objects returns the Phrase::Term objects, in which the nodes are wrapped.

my $deprel = $phrase->deprel();

Returns the type of the dependency relation of the phrase to the governing phrase. This is an abstract method that must be defined in every derived class. When the phrase structure is built around a dependency tree, the relations will be probably taken from (or based on) the deprels of the underlying nodes. When the phrase tree is transformed to the desired style, the relations may be modified; at the end, they can be projected to the dependency tree again. A general nonterminal phrase typically has the same deprel as its head child. Terminal phrases store deprels as attributes.

my $deprel = $phrase->project_deprel();

Returns the deprel that should be used when the phrase tree is projected back to a dependency tree (see the method project_dependencies()). In most cases this is identical to what deprel() returns. However, for instance prepositional phrases in Prague treebanks are attached using AuxP. Their relation to the parent (returned by deprel()) is projected as the label of the dependency between the preposition and its argument.

my $ord = $phrase->ord();

Returns the head node's ord attribute. This means that nodes that do not implement the Treex::Core::Node::Ordered role cannot be wrapped in phrases. We sometimes need to order child phrases according to the word order of their head nodes.

my ($left, $right) = $phrase->span();

Returns the lowest and the highest ord values of the nodes covered by this phrase (always a pair of scalar values; they will be identical for terminal phrases). Note that there is no guarantee that all nodes within the span are covered by this phrase. There may be gaps!

$phrase->project_dependencies();

Recursively projects dependencies between the head and the dependents back to the underlying dependency structure.

my $phrase_string = $phrase->as_string();

Returns a textual representation of the phrase and all subphrases. Useful for debugging.

AUTHORS

Daniel Zeman <zeman@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2013, 2015 by Institute of Formal and Applied Linguistics, Charles University in Prague This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.