NAME

Treex::Core::DocumentReader - interface for all document readers

VERSION

version 0.05222

DESCRIPTION

Document readers are a Treex concept how to load documents to be processed by Treex. The documents can be stored in files (in various formats) or read from STDIN or retrieved from a socket etc.

METHODS

To be implemented

These methods must be implemented in classes that consume this role.

next_document

Return next document (Treex::Core::Document).

number_of_documents

Total number of documents that will be produced by this reader. If the number is unknown in advance, undef should be returned.

Already implemented

is_current_document_for_this_job

Is the document that was most recently returned by $self->next_document() supossed to be processed by this job? Job indices and document numbers are 1-based, so e.g. for jobs = 5, jobindex = 3 we want to load documents with numbers 3,8,13,18,... jobs = 5, jobindex = 5 we want to load documents with numbers 5,10,15,20,... i.e. those documents where (doc_number-1) % jobs == (jobindex-1).

next_document_for_this_job

Returns a next document which should be processed by this job. If jobindex is set, returns "modulo number of jobs". See is_current_document_for_this_job.

number_of_documents_per_this_job

Total number of documents that will be produiced by this reader for this job. It's computed based on number_of_documents, jobindex and jobs.

restart

Start reading again from the first document. This implementation just sets the attribute doc_number to zero. You can add additional behavior using the Moose after 'restart' construct.

SEE

Treex::Block::Read::Sentences Treex::Block::Read::Text Treex::Block::Read::Treex

AUTHOR

Martin Popel <popel@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.