NAME
Treex::Core::Phrase
VERSION
version 2.20201228
DESCRIPTION
A Phrase
is a concept defined on top of dependency trees and subtrees (where a subtree contains a node and all its descendants, not just any arbitrary subset of nodes). Similarly to the Chomsky's hierarchy of formal grammars, there are two main types of phrases: terminal and nonterminal. Furthermore, there may be subtypes of the nonterminal type with special behavior.
A terminal phrase contains just one Node
(which typically corresponds to a surface token).
A nonterminal phrase does not directly contain any Node
but it contains one or more (usually at least two) sub-phrases. The hierarchy of phrases and their sub-phrases is also a tree structure. In the typical case there is a relation between the tree of phrases and the underlying dependency tree, but the rules governing this relation are not fixed.
Phrases help us model situations that are difficult to model in the dependency tree alone. We can encode multiple levels of “tightness” of relations between governors and dependents. In particular we can distinguish between dependents that modify the whole phrase (shared modifiers) and those that modify only the head of the phrase (private modifiers).
This is particularly useful for various tree transformations and conversions between annotation styles (such as in the HamleDT blocks). The idea is that we will first construct a phrase tree based on the existing dependency tree, then we will perform transformations on the phrase tree and finally we will create new dependency relations based on the phrase tree and on the rules defined by the desired annotation style. Phrase is a temporary internal structure that will not be saved in the Treex format on the disk.
Every phrase knows its parent (superphrase) and, if it is nonterminal, its children (subphrases). It also knows which of the children is the head (as long as there are children, there is always one and only one head child). The phrase can also return its head node. For terminal phrases, this is the node they enwrap. For nonterminal phrases, this is defined recursively as the head node of their head child phrase.
Every phrase also has a dependency relation label (deprel). These labels are analogous to deprels of nodes in dependency trees. Most of them are just taken from the underlying dependency tree and they are propagated back when new dependency structure is shaped after the phrases; however, some labels may have special meaning even for the Phrase
objects. They help recognize special types of nonterminal phrases, such as coordinations. If the phrase is the head of its parent phrase, its deprel is identical to the deprel of its parent. Otherwise, the deprel represents the dependency relation between the phrase and the head of its parent.
ATTRIBUTES
- parent
-
Refers to the parent
Phrase
, if any. - is_member
-
Is this phrase member of a paratactic structure such as coordination (where members are known as conjuncts) or apposition? We need this attribute because of the Prague-style dependency trees. We need it only during the building phase of the phrase tree.
We could encode this attribute in
deprel
but it would not be practical because it acts independently ofdeprel
. Unlikedeprel
,is_member
is less tied to the underlying nodes; it is really an attribute of the whole phrase. If we decide to change thedeprel
of the phrase (which is propagated to selected core children), we do not necessarily want to changeis_member
too. And we do not want to decodeis_member
fromdeprel
, shuffle and encode elsewhere again.When a terminal phrase is created around a
Node
, it takes itsis_member
value from the node. When the phrase receives a parent, theis_member
flag will be typically moved to the parent (and erased at the child). However, this does not happen automatically and theBuilder
has to do that when desired. Similarly, when the type of the phrase is changed (e.g. a newPhrase::PP
is created, the contents of the oldPhrase::NTerm
is moved to it and the old phrase is destroyed), the surrounding code should make sure that theis_member
flag is carried over, too. Finally, the value will be used when aPhrase::Coordination
is recognized. At that point theis_member
flag can be erased for all newly identified conjuncts because now they can be recognized without the flag. However, if thePhrase::Coordination
itself (or itsPhrase::NTerm
predecessor) is a member of a larger paratactic structure, then it must keep the flag for its parent to see and use.
METHODS
- $phrase->set_parent ($nonterminal_phrase);
-
Sets a new parent for this phrase. The parent phrase must be a nonterminal. This phrase will become its new non-head child. The new parent may also be undefined, which means that the current phrase will be disconnected from the phrase structure (but it will keeep its own children, if any). The method returns the old parent.
- my @dependents = $phrase->dependents();
-
Returns the list of dependents of the phrase. This is an abstract method that must be implemented in every derived class. Nonterminal phrases have a list of dependents (possible empty) as their attribute. Terminal phrases return an empty list by definition.
- my @children = $phrase->children();
-
Returns the list of children of the phrase. This is an abstract method that must be implemented in every derived class. Nonterminal phrases distinguish between core children and dependents, and this method should return both. Terminal phrases return an empty list by definition.
- if( $phrase->is_descendant_of ($another_phrase) ) {...}
-
Tests whether this phrase depends on another phrase via the parent links. This method is used to prevent cycles when setting a new parent.
- my $ist = $phrase->is_terminal();
-
Tells whether this phrase is terminal, that is, it does not have children (subphrases).
- my $isc = $phrase->is_coordination();
-
Tells whether this phrase is Treex::Core::Phrase::Coordination or its descendant.
- my $iscc = $phrase->is_core_child();
-
Tells whether this phrase is core child of another phrase. That is sometimes important to know because core children cannot be easily moved around.
- my $node = $phrase->node();
-
Returns the head node of the phrase. For terminal phrases this should just return their node attribute. For nonterminal phrases this should return the node of their head child. This is an abstract method that must be defined in every derived class.
- my @nodes = $phrase->nodes();
-
Returns the list of all nodes covered by the phrase, i.e. the head node of this phrase and of all its descendants.
- my @phrases = $phrase->terminals();
-
Returns the list of all terminal descendants of this phrase. Similar to
nodes()
, but instead ofNode
objects returns thePhrase::Term
objects, in which the nodes are wrapped. - my $deprel = $phrase->deprel();
-
Returns the type of the dependency relation of the phrase to the governing phrase. This is an abstract method that must be defined in every derived class. When the phrase structure is built around a dependency tree, the relations will be probably taken from (or based on) the deprels of the underlying nodes. When the phrase tree is transformed to the desired style, the relations may be modified; at the end, they can be projected to the dependency tree again. A general nonterminal phrase typically has the same deprel as its head child. Terminal phrases store deprels as attributes.
- my $deprel = $phrase->project_deprel();
-
Returns the deprel that should be used when the phrase tree is projected back to a dependency tree (see the method project_dependencies()). In most cases this is identical to what deprel() returns. However, for instance prepositional phrases in Prague treebanks are attached using
AuxP
. Their relation to the parent (returned by deprel()) is projected as the label of the dependency between the preposition and its argument. - my $ord = $phrase->ord();
-
Returns the head node's ord attribute. This means that nodes that do not implement the Treex::Core::Node::Ordered role cannot be wrapped in phrases. We sometimes need to order child phrases according to the word order of their head nodes.
- my ($left, $right) = $phrase->span();
-
Returns the lowest and the highest ord values of the nodes covered by this phrase (always a pair of scalar values; they will be identical for terminal phrases). Note that there is no guarantee that all nodes within the span are covered by this phrase. There may be gaps!
- $phrase->project_dependencies();
-
Recursively projects dependencies between the head and the dependents back to the underlying dependency structure.
- my $phrase_string = $phrase->as_string();
-
Returns a textual representation of the phrase and all subphrases. Useful for debugging.
AUTHORS
Daniel Zeman <zeman@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2013, 2015 by Institute of Formal and Applied Linguistics, Charles University in Prague This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.