NAME
Treex::Block::Read::BaseReader - abstract ancestor for document readers
VERSION
version 0.07190
DESCRIPTION
This class serves as an common ancestor for document readers, that have a parameter from
with a space or comma separated list of filenames to be loaded. It is designed to implement the Treex::Core::DocumentReader interface.
In derived classes you need to define the next_document
method, and you can use next_filename
and new_document
methods.
ATTRIBUTES
- from (required, if
filelist
is not set) -
space or comma separated list of filenames, or
-
for STDIN (If you use this method via API you can specifyfilenames
instead.) - filelist (required, if
from
is not set) -
path to a file that contains a list of files to be read (one per line)
- file_stem (optional)
-
How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.
- filenames (internal)
-
array of filenames to be loaded, automatically initialized from the attribute
from
METHODS
- next_document
-
This method must be overriden in derived classes. (The implementation in this class just issues fatal error.)
- next_filename
-
returns the next filename (full path) to be loaded (from the list specified in the attribute
from
) - new_document($load_from?)
-
Returns a new empty document with pre-filled attributes
loaded_from
,file_stem
,file_number
andpath
which are guessed based oncurrent_filename
. - current_filename
-
returns the last filename returned by
next_filename
- is_next_document_for_this_job
-
Is the document that will be returned by
next_document
supposed to be processed by this job? This is relevant only in parallel processing, where each job has a different$jobnumber
assigned. - number_of_documents
-
Returns the number of documents that will be read by this reader. If
is_one_doc_per_file
returnstrue
, then the number of documents equals the number of files given infrom
. Otherwise, this method returnsundef
.
SEE
Treex::Block::Read::BaseTextReader Treex::Block::Read::Text
AUTHOR
Martin Popel
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 220:
Non-ASCII character seen before =encoding in '©'. Assuming UTF-8