NAME

Treex::Block::Read::BaseAlignedReader - abstract ancestor for parallel-corpora document readers

VERSION

version 0.08055

SYNOPSIS

# in scenarios
Read::MyAlignedFormat en=english.txt de=german.txt

# Zones can differ also in selectors, any number of zones can be read
Read::MyAlignedFormat en_ref=ref1,ref2 en_moses=mos1,mos2 en_tectomt=tmt1,tmt2

DESCRIPTION

This class serves as a common ancestor for document readers that read more zones at once -- usually parallel sentences in two (or more) languages. The readers take parameters named as the zones and values of the parameters is a space or comma separated list of filenames to be loaded into the given zone. The class is designed to implement the Treex::Core::DocumentReader interface.

In derived classes you need to define the next_document method, and you can use next_filenames and new_document methods.

ATTRIBUTES

any parameter in a form of a valid zone_label

space or comma separated list of filenames, or - for STDIN.

file_stem (optional)

How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.

METHODS

next_document

This method must be overriden in derived classes. (The implementation in this class just issues fatal error.)

next_filenames

Returns a hashref of filenames (full paths) to be loaded. The keys of the hash are zone labels, the values are the filenames.

new_document($load_from?)

Returns a new empty document with pre-filled attributes loaded_from, file_stem, file_number and path which are guessed based on current_filenames.

current_filenames

returns the last filenames returned by next_filenames

number_of_documents

Returns the number of documents that will be read by this reader.

SEE ALSO

Treex::Block::Read::BaseReader Treex::Block::Read::BaseAlignedTextReader

AUTHOR

Martin Popel

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 228:

Non-ASCII character seen before =encoding in '©'. Assuming UTF-8