NAME
Treex::PML - Perl implementation for the Prague Markup Language (PML).
SYNOPSIS
use Treex::PML;
my $file="trees.pml";
my $document = Treex::PML::Factory->createDocumentFromFile($file);
foreach my $tree ($document->trees) {
my $node = $tree;
while ($node) {
... # do something on node
$node = $node->following; # depth-first traversal
}
}
$document->save();
INTRODUCTION
This package provides API for manipulating linguistically annotated treebanks. The module implements a generic data-model of a XML-based format called PML (http://ufal.mff.cuni.cz/jazz/PML/) and features pluggable I/O backends and on-the-fly XSLT transformation to support other data formats.
About PML
Prague Marup Language (PML) is an XML-based, universally applicable data format based on abstract data types intended primarily for interchange of linguistic annotations. It is completely independent of a particular annotation schema. It can capture simple linear annotations as well as annotations with one or more richly structured interconnected annotation layers, dependency or constituency trees. A concrete PML-based format for a specific annotation is defined by describing the data layout and XML vocabulary in a special file called PML Schema and referring to this schema file from individual data files (instances). The schema can be used to validate the instances. It is also used by applications to ``understand'' the structure of the data and to choose optimal in-memory representation. The generic nature of PML makes it very easy to convert data from other formats to PML without loss of information.
History
PML and was developed at the Institute of Formal and Applied Linguistics of the Charles University in Prague. It was first used in the Prague Dependency Treebank 2.0 and several other treebanks since. Conversion tools for various existing treebank formats are available, too.
This library was originally developed for the TrEd framework (http://ufal.mff.cuni.cz/tred) and evolved gradually from an older library called Fslib, implementing an older data format called FS format http://ufal.mff.cuni.cz/pdt2.0/doc/data-formats/fs/index.html (this format is still fully supported by the current implementation).
DESCRIPTION
Treex::PML provides among other the following classes:
- Treex::PML::Factory
-
a factory class which delegates object creation to a default factory class, which can be specified by the user (defaults to Treex::PML::StandardFactory). It is important that both user and library code uses the create methods from Treex::PML::Factory to create new objects rather than calling constructors from an explicit object class.
This classical Factory Pattern allows the user to replace the standard family of
Treex::PML
classes with customized versions by setting up a customized factory as default. Then, all objects created by the Treex::PML library and applications will be from the customized family. - Treex::PML::StandardFactory
-
the standard factory class.
- Treex::PML::Document
-
representing a PML document consisting of a set of trees.
- Treex::PML::Node
-
representing a node of a tree (including the root node, which also represents the whole tree), see "Representation of trees" in Treex::PML::Node for details.
- Treex::PML::Schema
-
representing a PML schema.
- Treex::PML::Instance
-
implementing a PML instance.
- Treex::PML::List
-
implementing a PML list.
- Treex::PML::Alt
-
implementing a PML alternative.
- Treex::PML::Seq
-
implementing a PML sequence.
- Treex::PML::Container
-
implementing a PML container.
- Treex::PML::Struct
-
implementing a PML attribute-value structure.
- Treex::PML::FSFormat
-
representing an old-style document format for documents in the FS format.
Resource paths
Since some I/O backends require additional resources (such as schemas, DTDs, configuration files, XSLT stylesheets, dictionaries, etc.), For this purpose, Treex::PML maintains a list of so called "resource paths" which I/O backends may conveniently search for their resources.
See "PACKAGE FUNCTIONS" for description of functions related to pluggable I/O backends and the list resource paths..
PACKAGE FUNCTIONS
- Treex::PML::does ($thing,$role)
-
- Parameters
-
$thing
- any Perl scalar (an object, a reference or a non-reference) - Description
-
This function is an alias for a very useful function UNIVERSAL::DOES::does(), which does checks if $thing performs the inteface (role) $role. If the thing is an object or class, it simply checks $thing->DOES($role) (see
UNIVERSAL::DOES
orUNIVERSAL
in Perl >= 5.10.1). Otherwise it tells whether the thing can be dereferenced as an array/hash/etc.Unlike UNIVERSAL::isa(), it is semantically correct to use does for something unknown and to use it for reftype.
This function also handles overloading. For example, does($thing, 'ARRAY') returns true if the thing is an array reference, or if the thing is an object with overloaded @{}.
Using this function (or UNIVERSAL::DOES::does()) is the recommended method for testing types of objects in the
Treex::PML
hierarchy (Treex::PML::Node,Treex::PML::Document
, etc.) - Returns
-
In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.
- Treex::PML::UseBackends (@backends)
-
- Parameters
-
@backends
- a list of backend names - Description
-
Demand loading and using the given modules as the initial set of I/O backends. The initial set of backends is returned by
Backends()
. This set is used as the default set of backends byTreex::PML::Document->load
(unless a different list of backends was specified in a parameter). - Returns
-
In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.
- Treex::PML::AddBackends (@backends)
-
- Parameters
-
@backends
- a list of backend names - Description
-
In a list context the list of already available backends sucessfully loaded, in scalar context a true value if and only if all requested backends were already available or successfully loaded.
- Returns
-
A list of backends already available or sucessfully loaded.
- Treex::PML::Backends ()
-
- Description
-
Returns the initial set of backends. This set is used as the default set of backends by
Treex::PML::Document->load
. - Returns
-
A list of backends already available or sucessfully loaded.
- Treex::PML::BackendCanRead ($backend)
-
- Parameters
-
$backend
- a name of an I/O backend - Returns
-
Returns true if the backend provides all methods required for reading.
- Treex::PML::BackendCanWrite ($backend)
-
- Parameters
-
$backend
- a name of an I/O backend - Returns
-
Returns true if the backend provides all methods required for writing.
- Treex::PML::ImportBackends (@backends)
-
- Parameters
-
@backends
- a list of backend names - Description
-
Demand to load the given modules as I/O backends and return a list of backend names successfully loaded. This list may then passed to Treex::PML::Document IO calls.
- Returns
-
List of names of successfully loaded I/O backends.
- Treex::PML::CloneValue ($scalar,$old_values?, $new_values?)
-
- Parameters
-
$scalar
- arbitrary Perl scalar$old_values
- array reference (optional)$new_values
- array reference (optional) - Description
-
Returns a deep copy of the Perl structures contained in a given scalar.
The optional argument $old_values can be an array reference consisting of values (references) that are either to be preserved (if $new_values is undefined) or mapped to the corresponding values in the array $new_values. This means that if $scalar contains (possibly deeply nested) reference to an object $A, and $old_values is [$A], then if $new_values is undefined, the resulting copy of $scalar will also refer to the object $A rather than to a deep copy of $A; if $new_values is [$B], all references to $A will be replaced by $B in the resulting copy. Note also that the effect of using [$A] as both $old_values and $new_values is the same as leaving $new_values undefined.
- Returns
-
a deep copy of $scalar as described above
- Treex::PML::ResourcePaths ()
-
Returns the current list of directories used by Treex::PML to search for resources.
- Treex::PML::SetResourcePaths (@paths)
-
- Parameters
-
@paths
- a list of a directory paths - Description
-
Specify the complete set of directories to be used by Treex::PML when looking up resources.
- Treex::PML::AddResourcePath (@paths)
-
- Parameters
-
@paths
- a list of directory paths - Description
-
Add given paths to the end of the list of directories searched by Treex::PML for resources.
- Treex::PML::AddResourcePathAsFirst (@paths)
-
- Parameters
-
@paths
- a list of directory paths - Description
-
Add given paths to beginning of the list of directories searched for resources.
- Treex::PML::RemoveResourcePath (@paths)
-
- Parameters
-
@paths
- a list of directory paths - Description
-
Remove given paths from the list of directories searched for resources.
- Treex::PML::FindInResourcePaths ($filename, \%options?)
-
- Parameters
-
$filename
- a relative path to a file - Description
-
If a given filename is a relative forward path (e.g. containing no up-dir '..' directory parts) of a file found in the resource paths, return:
If the option 'all' is true, a list of absolute paths to all occurrences found (may be empty).
If the option 'strict' is true, an absolute path to the first occurrence or an empty list if there is no occurrence of the file in the resource paths.
Otherwise act as with 'strict', but return unmodified
$filename
if no occurrence is found.If
$filename
is an absolute path, it is always returned unmodified as a single return value.Options are passed in an optional second argument as key-value pairs of a HASH reference:
FindInResources($filename, { # 'strict' => 0 or 1 # 'all' => 0 or 1 });
- Treex::PML::FindInResources ($filename)
-
Alias for
FindInResourcePaths($filename)
. - Treex::PML::FindDirInResourcePaths ($dirname)
-
- Parameters
-
$dirname
- a relative path to a directory - Description
-
If a given directory name is a relative path of a sub-directory located in one of resource directories, return an absolute path for that subdirectory. Otherwise return dirname.
- Treex::PML::FindDirInResources ($filename)
-
Alias for
FindDirInResourcePaths($filename)
. - Treex::PML::ResolvePath ($ref_filename,$filename,$search_resource_path?)
-
- Parameters
-
$ref_path
- a reference filename$filename
- a relative path to a file$search_resource_paths
- 0 or 1 - Description
-
If the
$filename
is an absolute path or an absolute URL, it is returned umodified. If it is a relative path and$ref_path
is a local path or a file:// URL, the function tries to locate the file relatively to$ref_path
and if such a file exists, returns an absolute filename or file:// URL to the file. Otherwise, returns the value ofFindInResourcePaths($filename)
if the$search_resource_paths
argument was true or absolute path or URL resolved relatively toref_path
otherwise.The rationale behind this function is as follows: paths that are relative to remote resources are to be preferably located in ResourcePaths; paths that are relative to a local resource are preferably located in the actual location and then in ResourcePaths.
EXPORTED SYMBOLS
For backward compatibility reasons only, Treex::PML exports by default the following function symbol:
ImportBackends
For this reason, it is recommended to load Treex::PML as:
use Treex::PML ();
The following function symbols can be imported on demand:
ImportBackends
, CloneValue
, ResourcePaths
, FindInResources
, FindDirInResources
, FindDirInResourcePaths
, ResolvePath
, AddResourcePath
, AddResourcePathAsFirst
, SetResourcePaths
, RemoveResourcePath
SEE ALSO
Tree editor TrEd: http://ufal.mff.cuni.cz/tred
Prague Markup Language (PML) format: http://ufal.mff.cuni.cz/jazz/PML/
Description of FS format: http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/fs.html
Related packages: Treex::PML::Schema, Treex::PML::Instance, Treex::PML::Document, Treex::PML::Node, Treex::PML::Factory
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Treex::PML
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT AND LICENSE
Copyright (C) 2006-2013 by Petr Pajas, Jan Stepanek
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.