NAME
Treex::Core::DocumentReader - interface for all document readers
VERSION
version 0.05222
DESCRIPTION
Document readers are a Treex concept how to load documents to be processed by Treex. The documents can be stored in files (in various formats) or read from STDIN or retrieved from a socket etc.
METHODS
To be implemented
These methods must be implemented in classes that consume this role.
- next_document
-
Return next document (Treex::Core::Document).
- number_of_documents
-
Total number of documents that will be produced by this reader. If the number is unknown in advance, undef should be returned.
Already implemented
- is_current_document_for_this_job
-
Is the document that was most recently returned by $self->next_document() supossed to be processed by this job? Job indices and document numbers are 1-based, so e.g. for jobs = 5, jobindex = 3 we want to load documents with numbers 3,8,13,18,... jobs = 5, jobindex = 5 we want to load documents with numbers 5,10,15,20,... i.e. those documents where (doc_number-1) % jobs == (jobindex-1).
- next_document_for_this_job
-
Returns a next document which should be processed by this job. If jobindex is set, returns "modulo number of jobs". See
is_current_document_for_this_job
. - number_of_documents_per_this_job
-
Total number of documents that will be produiced by this reader for this job. It's computed based on
number_of_documents
,jobindex
andjobs
. - restart
-
Start reading again from the first document. This implementation just sets the attribute
doc_number
to zero. You can add additional behavior using the Mooseafter 'restart'
construct.
SEE
Treex::Block::Read::Sentences Treex::Block::Read::Text Treex::Block::Read::Treex
AUTHOR
Martin Popel <popel@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.