NAME

Treex::PML::Document - Treex::PML class representing a document consisting of a set of trees.

DESCRIPTION

This class implements a document consisting of a set of trees. The document may be associated with a FS format and a PML schema and can contain additional meta data, application data, and user data (implemented as name/value paris).

For backward compatibility, a the document may also contain data related with the FS format, e.g. a patterns and tail.

METHODS

Treex::PML::Document->load (filename,\%opts ?)

NOTE: Don't call this method as a constructor directly, use Treex::PML::Factory->createDocumentFromFile() instead!

Load a Treex::PML::Document object from a given file. If called as a class method, a new instance is created, otherwise the current instance is reinitialized and reused. The method returns the instance or dies (using Carp::croak) if loading fails (unless option recover is set, see below).

Loading options can be passed as a HASH reference in the second argument. The following keys are supported:

backends

An ARRAY reference of IO backend names (previously imported using ImportBackends). These backends are tried additionally to Treex::PML::Backend::FS. If not given, the backends previously selected using UseBackends or AddBackends are used instead.

encoding

A name of character set (encoding) to be used by text-based I/O backends such as Treex::PML::Backend::FS.

recover

If true, the method returns normally in case of loading failure, but sets the global variable $Treex::PML::FSError to the value return value of readFile, indicating the error.

Treex::PML::Document->new (name?, file_format?, FS?, hint_pattern?, attribs_patterns?, unparsed_tail?, trees?, save_status?, backend?, encoding?, user_data?, meta_data?, app_data?)

Creates and returns a new FS file object based on the given values (optional). For use with arguments, it is more convenient to use the method create() instead.

NOTE: Don't call this constructor directly, use Treex::PML::Factory->createDocument() instead!

Treex::PML::Document->new({ argument => value, ... })

or

Treex::PML::Document->create({ argument => value, ... })

NOTE: Don't call this constructor directly, use Treex::PML::Factory->createDocument() instead!

Creates and returns a new empty Treex::PML::Document object based on the given parameters. This method accepts argument => value pairs as arguments. The following arguments are available:

name, format, FS, hint, patterns, tail, trees, save_status, backend

See initialize for more details.

$document->clone ($clone_trees)

Create a new Treex::PML::Document object with the same file name, file format, meta data, FSFormat, backend, encoding, patterns, hint and tail as the current Treex::PML::Document. If $clone_trees is true, populate the new Treex::PML::Document object with clones of all trees from the current Treex::PML::Document.

$document->initialize (name?, file_format?, FS?, hint_pattern?, attribs_patterns?, unparsed_tail?, trees?, save_status?, backend?, encoding?, user_data?, meta_data?, app_data?)

Initialize a FS file object. Argument description:

name (scalar)

File name

file_format (scalar)

File format identifier (user-defined string). TrEd, for example, uses FS format, gzipped FS format and any non-specific format strings as identifiers.

FS (FSFormat)

FSFormat object associated with the file

hint_pattern (scalar)

hint pattern definition (used by TrEd)

attribs_patterns (list reference)

embedded stylesheet patterns (used by TrEd)

unparsed_tail (list reference)

The rest of the file, which is not parsed by Treex::PML, i.e. Graph's embedded macros

trees (list reference)

List of FSNode objects representing root nodes of all trees in the Treex::PML::Document.

save_status (scalar)

File save status indicator, 0=file is saved, 1=file is not saved (TrEd uses this field).

backend (scalar)

IO Backend used to open/save the file.

encoding (scalar)

IO character encoding for perl 5.8 I/O filters

user_data (arbitrary scalar type)

Reserved for the user. Content of this slot is not persistent.

meta_data (hashref)

Meta data (usually used by IO Backends to store additional information about the file - i.e. other than encoding, trees, patterns, etc).

app_data (hashref)

Non-persistent application specific data associated with the file (by default this is an empty hash reference). Applications may store temporary data associated with the file into this hash.

$document->readFile ($filename, \@backends)

NOTE: Don't call this constructor directly, use Treex::PML::Factory->createDocumentFromFile() instead!

Read a document from a given file. The first argument must be a file-name. The second argument may be a list reference consisting of names of I/O backends. If no backends are given, only the Treex::PML::Backend::FS is used. For each I/O backend, readFile tries to execute the test function from the appropriate class in the order in which the backends were specified, passing it the filename as an argument. The first I/O backend whose test() function returns 1 is then used to read the file.

Note: this function sets noSaved to zero.

Return values: 0 - succes 1 - no suitable backend -1 - backend failed

$document->save ($filename?)

Save Treex::PML::Document object to a given file using the corresponding I/O backend (see $document->changeBackend) and set noSaved to zero.

$document->writeFile ($filename?)

This is just an alias for $document->save($filename).

$document->writeTo (glob_ref)

Write FS declaration, trees and unparsed tail to a given file (file handle open for reading must be passed as a GLOB reference). Sets noSaved to zero.

$document->filename

Return the FS file's file name. If the actual file name is a file:// URL, convert it to system path and return it. If it is a different type of URL, return the corresponding URI object.

$document->URL

Return the FS file's URL as URI object.

$document->changeFilename (new_filename)

Change the FS file's file name.

$document->changeURL (uri)

Like changeFilename, but does not attempt to absoultize the filename. The argument must be an absolute URL (preferably URI object).

$document->fileFormat

Return file format identifier (user-defined string). TrEd, for example, uses FS format, gzipped FS format and any non-specific format strings as identifiers.

$document->changeFileFormat (string)

Change file format identifier.

$document->backend

Return IO backend module name. The default backend is Treex::PML::Backend::FS, used to save files in the FS format.

$document->changeBackend (string)

Change file backend.

$document->encoding

Return file character encoding (used by Perl 5.8 input/output filters).

$document->changeEncoding (string)

Change file character encoding (used by Perl 5.8 input/output filters).

$document->userData

Return user data associated with the file (by default this is an empty hash reference). User data are not supposed to be persistent and IO backends should ignore it.

$document->changeUserData (value)

Change user data associated with the file. User data are not supposed to be persistent and IO backends should ignore it.

$document->metaData (name)

Return meta data stored into the object usually by IO backends. Meta data are supposed to be persistent, i.e. they are saved together with the file (at least by some IO backends).

$document->changeMetaData (name,value)

Change meta information (usually used by IO backends). Meta data are supposed to be persistent, i.e. they are saved together with the file (at least by some IO backends).

$document->listMetaData (name)

In array context, return the list of metaData keys. In scalar context return the hash reference where metaData are stored.

$document->appData (name)

Return application specific information associated with the file. Application data are not persistent, i.e. they are not saved together with the file by IO backends.

$document->changeAppData (name,value)

Change application specific information associated with the file. Application data are not persistent, i.e. they are not saved together with the file by IO backends.

$document->listAppData (name)

In array context, return the list of appData keys. In scalar context return the hash reference where appData are stored.

$document->schema

Return a reference to the associated PML schema (if any). Note: The pointer to the schema is stored in the metaData field 'schema'.

$document->schemaURL

Return URL of the PML schema the document is associated with (if any). Note that unlike $document->schema->get_url, the URL is not resolved and is returned exactly as referenced in the document PML header.

Note: The URL is stored in the metaData field 'schema-url'.

$document->changeSchemaURL($newURL)

Return URL of the PML schema the document is associated with (if any). Note: The URL is stored in the metaData field 'schema-url'.

$document->documentRootData()

Return the root data structure of the PML instance (with trees, prolog and epilog taken out) Note: The URL is stored in the metaData field 'pml_root'.

$document->treesProlog()

Return a sequence of non-tree elements preceding trees in the PML sequence (with role #TREES) from which trees were extracted (if any). Note: The prolog is stored in the the metaData field 'pml_prolog'.

$document->treesEpilog()

Return a sequence of non-tree elements following trees in the PML sequence (with role #TREES) from which trees were extracted (if any). Note: The epilog is stored in the the metaData field 'pml_epilog'.

$document->lookupNodeByID($id)

Lookup a node by its #ID. Note that the ID-hash is created when the document is loaded (and if not, when first queried), but is not maintained by this class. It must therefore be maintained by the application.

$document->deleteNodeIDHashEntry($node)

Remove a given node from the ID-hash. Returns the value removed from the ID hash (note: the function does not check if the entry for the given node's ID actually was mapped to the given node) or undef if the node's ID was not hashed.

$document->deleteIDHashEntry($id)

Remove a given ID from the ID-hash. Returns the removed hash entry (or undef if ID was not hashed).

$document->hashNodeByID($node)

Hash a node by its #ID. Note that the ID-hash is created when the document is loaded (and if not, when first queried), but is not maintained by this class. It must therefore be maintained by the application.

$document->nodeIDHash()

Return a hash reference mapping node IDs to node objects. If the ID hash did not exist, it is rebuilt. Note: the ID hash, if exists, is stored in the 'id-hash' appData entry.

$document->hasIDHash()

Returns 1 if the document has an ID-to-node hash map, 0 otherwise.

$document->rebuildIDHash()

Empty and rebuild document's ID-to-node hash.

$document->referenceURLHash

Returns a HASHref mapping file reference IDs to URLs.

$document->referenceNameHash

Returns a HASHref mapping file reference names to reference IDs. Each value of the hash is either a ID string (if there is just one reference with a given name) or a Treex::PML::Alt containing all IDs associated with a given name.

$document->referenceObjectHash()

Returns a HASH whose keys are reference IDs and whose values are either DOM or Treex::PML::Instance representations of the corresponding related resources. Unless related tree documents were loaded with loadRequiredDocuments(), this hash only contains resources declared as readas='dom' or readas='pml' in the PML schema.

Note: the hash is stored in the document's appData entry 'ref'.

$document->relatedDocuments()

Returns a list of [id, URL] pairs of related tree documents declared in the PML schema of this document as readas='trees' (if any). Note that Treex::PML::Document does not load related tree documents automatically.

Note: the hash is stored in the document's metaData entry 'fs-require'.

$document->loadRelatedDocuments($recurse,$callback)

Loads related tree documents declared in the PML schema of this document as readas='trees' (if any), unless already loaded.

Both arguments are optional:

the $recurse argument is a boolean flag indicating whether the loadRelatedDocuments() should be called on the loaded related docuemnts as well.

the $calback may contain a callback (anonymouse subroutine) which will then be invoked before retrieveing a related tree document. The callback will receive two arguments; the current $document and an URL of the related tree document to retrieve.

If the callback returns undef or empty list), the related document will be retrieved in a standard way (using Treex::PML::Factory-createDocumentFromFile>). If it returns a defined but false value (e.g. 0) the related document will not be retrieved at all. If it returns a defined value which is either a string or an URI object, the related document will be retrieved from that address. Finally, if the callback returns an object implementing the Treex::PML::Document interface, the object will be associated with the current docment.

$document->relatedSuperDocuments()

Returns a list of Treex::PML::Document objects representing related superior documents (i.e. documents that loaded the current documents using loadRelatedDocuments()).

Note: these documents are stored in the document's appData entry 'fs-part-of'.

$document->FS

Return a reference to the associated FSFormat object.

$document->changeFS (FSFormat_object)

Associate FS file with a new FSFormat object.

$document->hint

Return the Tred's hint pattern declared in the Treex::PML::Document.

$document->changeHint (string)

Change the Tred's hint pattern associated with this Treex::PML::Document.

$document->pattern_count

Return the number of display attribute patterns associated with this Treex::PML::Document.

$document->pattern (n)

Return n'th the display pattern associated with this Treex::PML::Document.

$document->patterns

Return a list of display attribute patterns associated with this Treex::PML::Document.

$document->changePatterns (list)

Change the list of display attribute patterns associated with this Treex::PML::Document.

$document->tail

Return the unparsed tail of the FS file (i.e. Graph's embedded macros).

$document->changeTail (list)

Modify the unparsed tail of the FS file (i.e. Graph's embedded macros).

$document->trees

Return a list of all trees (i.e. their roots represented by FSNode objects).

$document->changeTrees (list)

Assign a new list of trees.

$document->treeList

Return a reference to the internal array of all trees (e.g. their roots represented by FSNode objects).

$document->tree (n)

Return a reference to the tree number n.

$document->lastTreeNo

Return number of associated trees minus one.

$document->notSaved (value?)

Return/assign file saving status (this is completely user-driven).

$document->currentTreeNo (value?)

Return/assign index of current tree (this is completely user-driven).

$document->currentNode (value?)

Return/assign current node (this is completely user-driven).

$document->nodes (tree_no, prev_current, include_hidden)

Get list of nodes for given tree. Returns two value list ($nodes,$current), where $nodes is a reference to a list of nodes for the tree and current is either root of the tree or the same node as prev_current if prev_current belongs to the tree. The list is sorted according to the ordering attribute (obtained from FS->order) and inclusion of hidden nodes (in the sense of FSFormat's hiding attribute FS->hide) depends on the boolean value of include_hidden.

$document->value_line (tree_no, no_tree_numbers?)

Return a sentence string for the given tree. Sentence string is a string of chained value attributes (FS->value) ordered according to the FS->sentord or FS->order if FS->sentord attribute is not defined.

Unless no_tree_numbers is non-zero, prepend the resulting string with a "tree number/tree count: " prefix.

$document->value_line_list (tree_no)

Return a list of value (FS->value) attributes for the given tree ordered according to the FS->sentord or FS->order if FS->sentord attribute is not defined.

$document->insert_tree (root,position)

Insert new tree at given position.

$document->set_tree (root,pos)

Set tree at given position.

$document->append_tree (root)

Append tree at given position.

$document->new_tree (position)

Create a new tree at given position and return pointer to its root.

$document->delete_tree (position)

Delete the tree at given position and return pointer to its root.

$document->destroy_tree (position)

Delete the tree on a given position and destroy its content (the root and all its descendant nodes).

$document->swap_trees (position1,position2)

Swap the trees on given positions in the tree list. The positions must be between 0 and lastTreeNo inclusive.

$document->move_tree_to (position1,position2)

Move the tree on position1 in the tree list so that its position after the move is position2. The positions must be between 0 and lastTreeNo inclusive.

$document->test_tree_type ( root_type )

This method can be used before a insert_tree or a similar operation to test if the root node provided as an argument is of a type valid for this Treex::PML::Document. More specifically, return 1 if the current file is not associated with a PML schema or if the tree list represented by PML list or sequence with the role #TREES permits members of the type of root. Otherwise return 0.

A type-declaration object can be passed directly instead of root_type.

$document->determine_node_type ( node, { choose_command => sub{...} } )

If the node passed already has a PML type, the type is returned.

Otherwise this method tries to determine and set the PML type of the current node based on the type of its parent and possibly the node's '#name' attribute.

If the node type cannot be determined, the method dies.

If more than one type is possible for the node, the method first tries to run a callback routine passed in the choose_command option (if available) passing it three arguments: the $document, $node and an ARRAY reference of possible types. If the callback returns back one of the types, it is assigned to the node. Otherwise no type is assigned and the method returns a list of possible node types.

SEE ALSO

Treex::PML, Treex::PML::Factory, Treex::PML::Node, Treex::PML::Instance

COPYRIGHT AND LICENSE

Copyright (C) 2006-2010 by Petr Pajas

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.