NAME
DTA::TokWrap::Processor::mkbx0 - DTA tokenizer wrappers: sxfile -> bx0doc
SYNOPSIS
use DTA::TokWrap::Processor::mkbx0;
$mbx0 = DTA::TokWrap::Processor::mkbx0->new(%opts);
$doc_or_undef = $mbx0->mkbx0($doc);
##-- debugging
$mbx0_or_undef = $mbx0->ensure_stylesheets();
$mbx0->dump_chain_stylesheet($filename_or_fh);
$mbx0->dump_hint_stylesheet($filename_or_fh);
$mbx0->dump_sort_stylesheet($filename_or_fh);
DESCRIPTION
DTA::TokWrap::Processor::mkindex provides an object-oriented DTA::TokWrap::Processor wrapper for hint insertion and serialization sort-key generation on a text-free "structure index" (.sx) XML file.
Most users should use the high-level DTA::TokWrap wrapper class instead of using this module directly.
Constants
- @ISA
-
DTA::TokWrap::Processor::mkbx0 inherits from DTA::TokWrap::Processor.
Constructors etc.
- new
-
$mbx0 = $CLASS_OR_OBJ->new(%opts)
Constructor.
%opts, %$mbx0:
##-- Programs rmns => $path_to_xml_rm_namespaces, ##-- default: search inplace => $bool, ##-- prefer in-place programs for search? auto_xmlid => $bool, ##-- if true (default), @id attributes will be mapped to @xml:id auto_prevnext => $bool, ##-- if true (default), @prev|@next chains will be auto-sanitized ## ##-- Styleheet: chain-serialization chain_stylestr => $stylestr, ##-- xsl stylesheet string for chain-serialization chain_styleheet => $stylesheet, ##-- compiled xsl stylesheet for chain-serialization ## ##-- Styleheet: insert-hints (<seg> elements and their children are handled implicitly) hint_sb_xpaths => \@xpaths, ##-- add sentence-break hint (<s/>) for @xpath element open & close hint_wb_xpaths => \@xpaths, ##-- ad word-break hint (<w/>) for @xpath element open & close ## hint_stylestr => $stylestr, ##-- xsl stylesheet string hint_styleheet => $stylesheet, ##-- compiled xsl stylesheet ## ##-- Stylesheet: mark-sortkeys (<seg> elements and their children are handled implicitly) sortkey_attr => $attr, ##-- sort-key attribute (default: 'dta.tw.key') sort_ignore_xpaths => \@xpaths, ##-- ignore these xpaths sort_addkey_xpaths => \@xpaths, ##-- add new sort key for @xpaths ## sort_stylestr => $stylestr, ##-- xsl stylesheet string sort_styleheet => $stylesheet, ##-- compiled xsl stylesheet
- defaults
-
%defaults = CLASS->defaults();
Static class-dependent defaults.
- init
-
$mbx0 = $mbx0->init();
Dynamic object-dependent defaults.
Methods: XSL stylesheets
- ensure_stylesheets
-
$mbx0_or_undef = $mbx0->ensure_stylesheets();
Ensures that required XSL stylesheets have been compiled.
- hint_stylestr
-
$xsl_str = $mbx0->hint_stylestr();
Returns XSL stylesheet string for the 'insert-hints' transformation, which is responsible for inserting sentence- and token-break hints into the input *.sx document.
- sort_stylestr
-
$xsl_str = $mbx0->sort_stylestr();
Returns XSL stylesheet string for the 'generate-sort-keys' transformation, which is responsible for inserting top-level serialization-segment keys into the input *.sx document.
- dump_chain_stylesheet
-
$mbx0->dump_chain_stylesheet($filename_or_fh);
Dumps the generated 'serialize-chains' stylesheet to $filename_or_fh.
- dump_hint_stylesheet
-
$mbx0->dump_hint_stylesheet($filename_or_fh);
Dumps the generated 'insert-hints' stylesheet to $filename_or_fh.
- dump_sort_stylesheet
-
$mbx0->dump_sort_stylesheet($filename_or_fh);
Dumps the generated 'generate-sortkeys' stylesheet to $filename_or_fh.
Methods: top-level
- mkbx0
-
$doc_or_undef = $CLASS_OR_OBJECT->mkbx0($doc);
Applies the XSL pipeline for hint insertion and sort-key generation to the "structure index" (*.sx) document of the DTA::TokWrap::Document object $doc.
Relevant %$doc keys:
sxfile => $sxfile, ##-- (input) structure index filename bx0doc => $bx0doc, ##-- (output) preliminary block-index data (XML::LibXML::Document) ## mkbx0_stamp0 => $f, ##-- (output) timestamp of operation begin mkbx0_stamp => $f, ##-- (output) timestamp of operation end bx0doc_stamp => $f, ##-- (output) timestamp of operation end
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
SEE ALSO
DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2009-2018 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.