NAME
Text::Fab - A powerful, general-purpose document expansion framework (currently exists mostly as uncomplete documentation and a beginning of a stub of implementation)
SYNOPSIS
# -- File: config.fab -- Example of merging multiple files (with almost no content!)
# Declare which configuration keys will be treated as lists
#append Fab/list_keys chapters, css_files
# Set some configuration variables
#set site_title=My Awesome Book
#append css_files global.css, theme.css
# Define the inheritance graph for our namespaces
#set_parents ChapterLayout BaseLayout
#set_parents Book _main # The final book inherits from _main's structure
# -- File: main.fab --
# The main assembly script. The default output target is '_main:body'.
#append chapters chapter1, chapter2, chapter3 # Build the chapter list
#emb title
#emb table_of_contents
#emb all_chapters
#emb footer
# -- File: components.fab --
# Define reusable components in namespaces
#target_section title in ChapterLayout
<h1>#emb _main:title</h1> # Note: Fully qualified name
#end_section
#target_section all_chapters in ChapterLayout
#= FOREACH chapter IN C('chapters') ... # (Chunk Preprocessor syntax)
#emb @chapter:body
#= END #
#end_section
#target_section footer in BaseLayout
<footer>Copyright 2025</footer>
#end_section
DESCRIPTION
Text::Fab is not a templating engine; it is a framework for building your own. At its core, it is more or less a minimal "dumb" state machine that exposes a stable API of primitive operations. Most of the "smart" logic for parsing syntax, processing text, and creating complex control structures is delegated to user-configurable components. (However, a certain minimal level of smartness is a — switchable off — default to handle the most common cases.) As a result, it knows nothing about the processed “language”'s syntax or semantics.
The main design goal is that all state of Fab is explicitly introspectable: there is no hidden magic. The entire state of the system is contained in the Configuration hash and the collection of already constructed Output Sections. What you see is what you get.
The Fab’s purpose is to factor out the common needs of all the configurable-on-the-fly document processing engines; more precisely, it focuses on the needs orthogonal to the particular syntax of a particular problem domain. (This architecture aims to overcome the experiences with designing “simple” configurable systems — which turn out to be not scalable, so all the approaches to make them operable crumble under their own weight.)
In short: this attempts to abstract out all the complexity of scalable configurability. The resulting “tools” support architecture with a top→down approach to recognizing the “reconfiguration directives”.
————————————————————————————————————
By default, Text::Fab reads input and filters it through a specified preprocessing engine (which defaults to pass-through-unchanged) to the standard output. Its power comes from executing the interleaved-in-the-input directives that are tools for controlling the processing engine, reshuffling content, and templating.
The process is straightforward: Text::Fab asks a De-Interleaver callback "Where is the next control directive?". It then (optionally, see below) passes the text chunk before that location to a Chunk Preprocessor (the "stomach"). Finally, it allows the Parser to execute the directive (typically, the Parser would convert the details of the directive into calls to zero or more of Text::Fab's primitive API methods to alter its internal state). This cycle repeats until the input is exhausted.
Furthermore, as an extra configurable step, the De-Interleaver may need to report that a particular directive “is a comment-to-ignore”. In this case, Fab would postpone processing the text chunk before that location, instead merging it with the following chunk(s) before passing to the Pre-Processor. So the process’ cycle is: the De-Interleaver finds the next directive. The Comment Recognizer may then optionally identify a comment, allowing for the fusion of the preceding and succeeding text chunks. Depending on this: now — or later — this final chunk is then passed to the Chunk Preprocessor. Finally, the Parser executes the directive found above, which may alter the system's state for the next cycle.
Finally, this “structured” (see below) pre-processed data created this way is “merged” into the final output text stream following flexible “hierarchical” rules.
THE PROCESSING MODEL
First of all, the inclusion mechanism handles a stack of currently opened-for-processing files; these files are “combined as usual” into one input stream. In addition to this, the Fab uses three more data collections.
The configuration directives may affect how the preprocessor handles the chunks of its input, whether “the comments interleaved into the stream” split these chunks etc., and may modify how the “directinves interleaved into the input stream” are recongnized (and even what these directives mean). The grouping directives control which changes to configuration are undone and when. The output directives may insert into the output section "an order to inline another section" (a promise, which is not executed until much later). Another kind of output directive allows switching the target section for the output of the preprocessor.
(It makes sense to also focus on specific parts of the Configuration Hash: the Control Stack which manages conditional handling and looping, Grouping Stack which handles undoing changes to configuration, and Caching Engine which helps to avoid the penalties of constantly recalculating data dependent only on the configuration stack.)
After all the sections has been constructed, they may be “joined arbitrarily” into the final document. This assembly process is controlled by the rules how to choose a named section if “multiple flavors” have been defined. This allows a flexible hierarchical system of “overridable templates”. Such templates together with the rules of “namespace resolution” build the logic of processing that ultimately generates the final document(s).
The Configuration Hash
The heart of Text::Fab is a single Configuration hash. It is a live data structure that directives can modify during processing. It controls everything, from user-defined variables to the very components (parsers, etc.) that define the language's behavior. By default, all internal configuration used by the framework itself is stored under the Fab/ top-level key. (Later we are going to discuss how to modify this default, and how this prefix changes during the input processing. However, in examples below we assume that this hasn't been changed.)
Configuration values can be scalars or lists. A special list, Fab/list_keys, is used to declare which keys should be treated as lists by the primitive operations.
The Output Sections
A section is a named buffer that holds already preprocessed content. A section's name is a pair of strings: a namespace and a basename (e.g., Chapter1:body). The default namespace is _main.
It makes sense to imagine that a section consists of two interleaved lists, each of its specific type of content:
Plain text chunks, which have been processed by the Chunk Preprocessor.
embedplaceholders, which are promises to inline another section during the final Assembly Phase. This other section is described by its basename as well as some extra data (these data is used in the calculation of its namespace, which is going to be performed later).
Grouping: Scoped/Undoable Configuration Changes
(Below, we assume a particular #-based format for directives. The actual operation is agnostic of this format.)
#start_group <flavor_name>-
Begins a scope. A certain set of keys in the Configuration hash are snapshotted. The
flavor_nameis used to look up the list of Configuration keys to memorize/restore. #end_group [<flavor_name>]-
If a
flavor_nameis provided, it must be a flavor of an already opened group . All changes made to the observed configuration keys are undone. If the optional name is not given, closes the most recently opened scope.
Note: The nesting behavior of groups (e.g., whether flavor A must be nested within flavor B) is itself controlled by the Configuration hash.
PRIMITIVE API REFERENCE
These are the low-level methods on the Text::Fab object that user-defined Parsers can call. The user-facing directives (like #set) are defined by the Parser, whose role is to convert these directives into calls to the Text::Fab methods listed below.
Output Control Primitives
out__target_section( $basename, $namespace )-
Redirects subsequent output to the specified section. Creates the section if it does not exist.
out__create_section( $basename, $namespace )-
Creates a section, clearing its content if it already exists.
out__embed( $basename, $namespace, \%options )-
Places an
embedplaceholder into the current target section. The%optionshash can contain advanced features likeasiffromorwith_blinder.
Configuration Primitives
cfg__set( $key, @values )-
Sets the value of a key.
cfg__append( $key, @values )-
Appends one or more items to a list key, or appends a string value to a scalar key.
cfg__prepend( $key, @values )-
Prepends one or more items to a list key, or prepends a string value to a scalar key.
cfg__get( $key, [$offset] )-
Retrieves the value of a scalar key, or of an element of a list key.
cfg__get_joined( $key, \@joiners, \%options )-
A powerful utility that retrieves a list-type key and joins its elements into a single string. The
@joinersarray provides one or more separators that are cycled through when joining. The%optionscan control complex formatting, for example specify permitted numbers of joiners moduloscalar @joiners. (E.g., if “joiners come in pairs”, then the number of emitted joiners must be even!)
An analogous out__get_joined exists for converting a sections (representable as a list of text-parts and embed-promises) into a single string.
Additionally, one can use cfg__pop($key, [$count]), cfg__prepend_elt( $key, $offset, $value ) etc. on a list key. (The parser for directive may implement only the flavors with $offset in 0 and -1 only, e.g., “as if” in cfg__prepend_last( $key, $value ) etc.) cfg__pop() also can be used to delete() one or more subkeys from a hash value (then the $count is replaced by the list of these keys).
Working with Hashes
In addition to scalars and lists, the framework supports hashes as a first-class configuration data type. This is controlled by two special configuration keys:
Fab/hash_keys-
Analogous to
Fab/list_keys, this is a hash whose keys are the names of configuration variables that should be treated as hashes by the primitive operations.# In a config file: #append Fab/hash_keys my_parameters Fab/hash_key_sort_order-
This key holds a subroutine reference that defines how the keys of a hash should be sorted when the hash needs to be treated as an ordered list (e.g., for
cfg__get_joined). The subroutine receives two keys as arguments. The default is standard string comparison.# In Perl code, to set numeric sorting: $fab->cfg__set('Fab/hash_key_sort_order', sub { $_[0] <=> $_[1] });
Grouping and uplevel Primitives
group__start( $flavor_name )-
Starts a scoped group.
group__end( [$flavor_name] )-
Ends a scoped group.
group__postpone( $depth, $method_name, @args )-
Schedules a
Text::Fabmethod call to be executed just before a group at a specific relative depth is closed. (Since the stack of group names is available in the configuration hash keyed asopened_groups, the parser for directives may allow more user-friendly UI, such as specifying “a path” to this group, as in “the last-but-onexgroup before the last twoygroups.)
Configuration Prefix API
For recursive processing, the Configuration contains a stack of prefixes at Fab/call_stack_prefixes. Primitives are provided to safely interact with keys relative to this stack. (We list only the most basic retreaval primitives; the rest follows the same principle.)
cfg__get_prefixed( $key )-
Retrieves the value of $key using only the current, most recent prefix (i.e.,
CURRENT_PREFIX/$key). cfg__get_prefixed_scan( $key )-
Retrieves the value of
$keyby searching for it under each prefix in the call stack, from most recent to least recent. cfg__get_uplevel_scan( $level, $key )-
Retrieves the value of
$keyby first popping$levelprefixes from the call stack and then performing a prefixed search. cfg__set_prefixed( $key, @values )-
Sets the value of
$keyusing the current (most recent) prefix.
For setting things “uplevel on this stack”, there are two methods matching two types of access:
cfg__set_prefixed_uplevel( $level, $key, @values )-
As
cfg__set_prefixed(), but goes back$levelsteps on the stack of prefixes. This matches retreaval viacfg__get_prefixed(). cfg__set_prefixed_uplevel_fixup( $level, $key, @values )-
As
cfg__set_prefixed_uplevel(), but first runs “fixup”: copying the preceding value at$levelto the levels between this one and the current level — stopping when value is found on a particular level. This matches retreaval viacfg__get_prefixed(): this call won’t change the results on the levels between$leveland the current one.
ADVANCED TOPICS
Input Processing and Recursion
The primitive to support the #include directive is include__text($how, $data). If $how is filename, the file $data is searched for and slurped; if not, $data is the input string to be processed. A special form #include_scoped ensures that configuration changes made by the included file are temporary by wrapping the call in a group named by the last argument.
Recursive processing (e.g., using one Text::Fab “process” to filter data for another) is managed entirely within the Configuration hash. A stack of configuration prefixes is maintained in Fab__call_stack_prefixes (which usually starts as just Fab/). When a named recursive filter is invoked, its name is pushed to the filer name stack Fab__call_stack_filter_name. This also pushes a certain prefix onto Fab__call_stack_prefixes. Since it is the last prefix on this stack which is actually used by Text::Fab for its operation, this allows a special predefined configuration (e.g., Fab/myList2ScalarFilter) to take precedence without conflicting with the “normal” operation of the Fab.
The uplevel family of configuration directives operates with respect to both the group stack and this call stack, allowing for powerful, controlled communication between different layers of processing.
Namespace Resolution and Inheritance of output sections
The inheritance system is a layer of logic built on top of the primitive operations. It is controlled entirely by the Configuration hash. Here we illustrate the API by how the corresponding directives may look like. The API has an extra prefix NS__, and omits the UI verbiage.
#set_parents <namespace> <parent1> <parent2> ...-
A user-facing directive that calls
cfg__seton a key within the Configuration that stores the inheritance DAG. For example, it might setFab/inheritance_graph/MyNamespace/parentsto[Parent1, Parent2].
The Assembly Phase uses this graph to resolve embed placeholders in the output sections; below we denote them as #emb. The resolution can be controlled with extra data:
#emb <basename> in <namespace> asiffrom <StartNamespace>-
This performs a "static" embed. It resolves the namespace of the section by starting the search directly from
StartNamespace's context. When this context is not the root, this makes it "blind" to any “override sections” up the tree/DAG.StartNamespacedefaults to the root namespace in the DAG — which typically contains the most specific overrides. #emb <basename> ... with_blinder <SubName>-
Here
SubNameprovides a completely customizable analog of the previous call. Instead of (or in addition to) obscuring the part of DAG which is not reachable fromStartNamespace, the output of this user-defined subroutine obscures an arbitrary subset of the DAG. It receives the entire current embedding stack (each entry is the currently executed embedding promise) together with the start namespace, the used blinder name, and the output of the blinder for each level. It return a dynamic list of namespaces to "blind" for this specific resolution and any sub-resolutions (which may happen during processing of this embedded section).
Support of Control-Flow-Like constructs (conditionals and/or loops)
To simplify support of matched nested control-constructs in the user-defined parser, Fab maintains 6 stacks of matched lengths. Because of this match, they should be modified only via the corresponding API. E.g., the last elements of these lists describe:
The
typeof the last encountered starter of the construct. (Set bycontrol__start())The (optional) text
labelfor the construct. (Likewise. Defined for “loop-like” constructs only.)Are we in a
skippingor not (“live”) branch/flavor of the construct. (With the value 2 if the processing were skipping even before meeting the start of the construct. Set likewise at start, then may be changed.)The
offsetof the start of the body (for “loop-like” constructs only). (This may be the file offset forseek()able files, otherwise offset in a suitable buffer.)Other useful
offsets(for support of various loop-related targets for analogues ofgoto) packed into a hash.The
loop_counter(defined for “loop-like” constructs only). Changed (in a parser-specified ways) on “jump” calls.Which
targetToSeekwe are trying to find in the “skipping” mode now.
There is also the list of the indices Fab/control__loop_indices of “loop-like” constructs in this stack.
Essentially, when encountering the start of an analogue of if/elsif/else/endif, the parser may registers its type, as well as whether the start is in the “skipping” mode via ctrl__start(TYPE,IS_skipping,LABEL). Likewise, every flip of the skipping mode is registered with ctrl__skipping(IS_skipping) (this is going to be ignored if the current skipping state is 2; likewise, Fab knows when to set the state 2 automatically at start). When the end of the construct is encountered, these data may be popped by ctrl__end(TYPE) (with TYPE given for error-checking only).
The latest “skipping” state determines whether the preprocessor calls are skipped, and it is also given as an argument to the de-interlacer and parser callbacks. It also affects the state of loops nested inside “skipped” input.
These Control Stacks are reset when a new input file is included and restored when processing of that file completes.
The API to deal with these stacks is:
control__start( $type, $is_skipping, $label)-
Pushes a new frame onto the Control Stack.
$typeand$labelare stored directly.The initial
skipping_statefor the new frame is computed as follows: if the parent frame (if one exists) is already in any skipping state, the new frame's state is set to "skipping-on-entry" (state 2). Otherwise, the new frame's skipping state is set based on the boolean$is_skipping(true-> "skipping",false-> "live").
control__set_skipping( 'flip' | $boolean )-
Modifies the
skipping_stateof the current control block.If the argument is
'flip', the state toggles between "live" and "skipping", but only if the current state is not "skipping-on-entry" (state 2).If a boolean is provided, the state is set accordingly, again respecting the immutability of the "skipping-on-entry" state. This is called by the Parser for constructs like
#elif.
control__end( $expected_type )-
Pops the top frame from the Control Stack. For error-checking, the Parser must provide the
$expected_type. (Typically, this is used for loop-like constructs only when they are in “skipping mode”; see the next API.)Fab/control__loop_indicesis changed accordingly. - control__define_invalidation_pointer($expected_type)
-
Mark the next position as “First postion not in the loop”. When this position “is activated”, or the jump is performed past this position (or before the starting position), the effect should be the same as
control__end( $expected_type ). control__set_pointer( $pointer_name, [$before}, [$force] )-
Records the current position in the input stream (file or buffer) (if
$before, this is the position before the directive) and stores it under$pointer_namein the "Custom Pointers" hash of the current control block. This is the sole mechanism for defining loop start points,continuetargets, or any other labeled position. For all loop-like constructs, the Parser is expected to call this at the beginning of the loop body to define a'body_start'pointer.These pointers form the structural geometry of the loop. One must keep in mind that Fab may assume that this geometry is not modified (only enhanced by definition of new pointers) during “each pass” of a particular “instance of processing” this loop. (Here our terminology is a bit dense. If the current loop is nested inside another loop, it can be entered many times — leading to different instances of processing this loop. In each instance the body may be looped over in several passes.)
By default, this condition is auto-enforced: this call is ignored on the “replay passes” of “loop-like” constructs (i.e., if the current portion of the loop’s body “has already been read” — inside the current instance). This filtering may be disabled by the
$forceflag. (A similar flag$is_replayis passed to the parser — just in case.) control__jump_to( $pointer_name, [$block_label] , [$incr_loop_counter])-
This is the primitive for all control flow transfers. It instructs the
Fabcore's main processing loop to change its read position.If
$block_labelis given, it walks up the Control Stack to find the frame with that primary label. If not, it uses the current (top-most) frame.It looks up
$pointer_namein that frame'spointershash.If found, it commands the main loop to immediately stop scanning at the current location and resume reading from the retrieved pointer's location (either by seeking in a file or switching to a buffer at a specific offset).
If not found, it adds a new
targetToSeekname, sets the current execution state to "skipping" (this essentially initiates the forward scan).
The
loop_counteris incremented by the number in$incr_loop_counter(when the target is found).
Support of Caching configuration-related data
Sometimes a callback may need a computationally expensive combination of individual entries from a configuration hash. (On may thingk of a regular expression depending on several values in this hash.) To avoid recalculating it every time, the callback may use the logic like this:
unless ($cache_at_count == (my $new_cnt = $fab->{config}{cnt__parser_REx})) {
$cache = my_recalc_cache($Fab);
$cache_at_count = $new_cnt;
}
This assumes that $fab->{config}{cnt__parser_REx} changes every time one of the variables used by my_recalc_cache() is updated (or restored). To support this, one register each of these variables with the “dependent counter” cnt__parser_REx using the API:
config__append_dependent_counter($counter_key, $KEY1, ...)
(Warning: the implementation is free to change the counter back when a group is undone if it knows all the relevant-and-changed variables were properly undone. So (theoretically) one cannot rely on the counter to increase-on-modification.
A tacit assumption — caveat usor
Summary: Before starting a design using Text::Fab, check that inside your preprocessor, the internal representation of the partially processed input can be easily enough kept in the Configuration Hash. We assume that this is going to hold as far as the preprocessor “proceeds with a well-defined processing pipeline”.
---
In many cases one needs to allow the configuration driving the logic of work of a preprocessor to be changeable by a directive at any time. (For example, a text in one programming language may inline a text in another programming language!)
The design of Fab’s API can make this (seem to be?) hard: (due it its up→down approach to the search of directives) the directive above would interrupt the input to the preprocessor; — but sometimes the preprocessor may need a non-local view on the input to decide how to deal with it. “Just seeing what comes before the directive”, then “Just seeing what comes after the directive” (in a separate invocation!) may be not enough for such a non-local view.
On the other hand, one should be ready for the situation when the input after this directive should be handled differently than the input before this directive. So if the target domain requires handling such situations, then any implementation of such a preprocessor must be ready to deal with such hiccups when processing the input (either by handling the update of its config, or bulking out with an error message). While doing this with a bottom→up design may seem easier, the experience shows that often it is prohibitively hard to implement. (E.g., when nested constructions are dealt with by recursive calls, the calls to reconfigure_me() would happen deep inside the stack of recursive calls!)
In short: to implement such a design through Text::Fab the only possible complication is: how to preserve the internal representation of the “preceding not-yet-fully-digested input” within the preprocessor “when the preprocessor is reentered” after handling such a directive?
With the bottom→up approach (when the preprocessor may “call some API” to handle the directive) this is easy: it is easy to make sure that such a call may update “the configuration” of the preprocessor while keeping “the internal representation of the already-read-input-to-preprocess” fully intact. However, when the preprocessor is controlled from a Fab, what happens instead of “calling the API” is: “exit the preprocessor”, then somebody else (the Fab!) calling the API, then “reenter the preprocessor again”. Therefore when the preprocessor can see that the chunk it obtained is a part of “some larger construct” (so it is not ready to be flush()'ed to the output section yet), it must preserve its internal state “somewhere”, then read it back when it is reentered (if it is ever reentered). This is the price to pay for (a lot of) work offloaded to Fab.
CONCLUSION:
In such situations the preprocessor needs a guarantee that it is going to be reentered;
Its internal data should be easily translatable to/from formats supported by the configuration hash.
(Indeed, this hash is the only supported storage provided by Fab’s API.)
In other words: the preprocessor cannot be stateless now, but as far as the state may be (de)serialized easily, it is not a big deal. So oen possible tacit assumption about designs using Text::Fab may be:
The preprocessing is hierarchical, each level of hierarchy of not-yet-fully-processed-input may be put into a value suitable for the Configuration Hash, and the steps remaining to finish processing these data are not affected by the directive above.
————————————————————————————————————
Next, inspect how not-stateless but still “easy to implement” preprocessors may look like.
Example B: We have a text which may contain nested “blocks”, each block made of one or more paragraphs. The nesting is defined by the indent level (as in Python). The “class” of each block is determined by inspecting the immediate start of the its first paragraph, as well as inspecting the immediate end of the last paragraph of block (provided this last paragraph is “on the same indent level as the start” — so it is not in a nested subblock).
In addition to this we allow #define directives which take a whole line — but are not considered as interrupting the logic of indentation-which-defines-nested-depth.
Implementation: To implement “reenteracy”, the stored data consists of the stack of already-preprocessed text of non-yet-finished blocks. When preprocessing a chunk of input finishes a block, the preprocessor can inspect its start and end, determine its class, pop()'s the known “intermediate textural representation” from the stack, and converts it to the final representation in the way suitable for this class. When this was the outermost block, the result is put into the output section. Otherwise it is appended to the preceding entry on the stack (the containing block).
————————————————————————————————————
Example H: As above, but the rules for massaging a block depend not only on its class, but also on the classes of its parents. This requires “the most general” hierarchical model of the pipeline of not-yet-fully-processed input.
In this case again the handled data may be flush()'ed to an output section only when the outermost encountered block is terminated. Until then, one should maintain the stack of “live blocks” (not yet terminated), as well as a forest of “dead” (already terminated, but waiting the finite processing) blocks. Each terminated block has its class and the list of content, elements are either “a literal text” (preprocessed-except-the-final-step), or another (nested) terminated sub-block. The live blocks are likewise, but the class is not yet fully known, and the last nested sub-block may be live.
How to preserve this in the Configuration Hash? We have a planar rooted tree with vertices assigned the class (or undef for live vertices), and leaves containing “a plain text”. When flush()'ing to an output section, we essentially scan depth-first through the vertices, maintaining the stack of classes (they correspond to the path from the root to the current vertex). So the only bookkeeping data in addition to the stack of classes is the (parallel) stack of “where the currently-not-yet-finished sub-block ends”. (So we can pop() the stack of classes when this end is reached.)
So if we store the leaves in the depth-first order, we need to interlace this list with markers “start sub-block of the class CLASS ending at position POS” (or introduce nested start and end markers). To design an easy representation, suppose that the semantic of “massaging the blocks” allows introdution of empty "plain text leaves" between every two adjacent sub-vertices; then we may assume that “end-subblock” entries are represented by out-of-bound elements (e.g., undef), and the plain text leaves are alternating with “start subblock” entries in the stored list, as in flattened list of the following form:
0thLlvTxt1
Blk1
1stLvl1Txt11
Blk11
2ndLvl1Txt111
undef
1stLvl1Txt12
Blk12
2ndLvl1Txt121
undef
1stLvl1Txt13
undef
0thLlvTxt2
Blk2
1stLvl1Txt21
Blk21
2ndLvl1Txt211
undef
1stLvl1Txt22
Blk22
2ndLvl1Txt221
undef
1stLvl1Txt23
undef
0thLlvTxt3
Observe alternation Txt/BlkClass/Txt/BlkClass/Txt/… after every undef. It is trivial to process such a list when flush()'ing. When filling the list, one should additionally maintain a stack of positions in this list where live block starts, so one can change a placeholder-for-its-class to the calculated class when the end-of-block is found. (Other than this, all the results of pre-processing the input land at the end of the list.)
————————————————————————————————————
Example A: Consider a macroprocessor for macros with arguments; assume that the argument-separators and argument-terminators for a macro on a certain level of nesting may be contained in the output of nested-deeper macros. So one needs to expand the deeper macros before the handling of enclosing macros is finished. Can one support directives occuring deep inside the nested arguments — but we assume they cannot be contained in the macro-expansions?
Here the data may be even easier than in the preceding example. Since the output produced up to the start of the outermost no-yet-terminated macro may be flash()'ed, the stack of unprocessed data may contain the ID of the outermost no-yet-terminated macro, the list of its completed arguments, then the already expanded prefix of the current argument, then the ID of the next no-yet-terminated macro, the list of its completed arguments etc. In addition to this list, it is enough to have the list of offsets at which the IDs of the macros live.
When a terminator of a macro is found, one pops from the list above the macro ID and the completed arguments, and performs the macrosubstitution. There are two ways to proceed with this string: it is either appended to “the already expanded prefix of the current argument” (as above) of the enclosed macro (if the expansion should not be macro-re-expanded) — and optionally scanned for macro-argument-delimiter or -terminator, or it is prepended to the buffer containing the not-yet-processed input (otherwise). The desired choice is determined by the semantic of the macro-expansion.
————————————————————————————————————
Example E: As above, but the directives may appear as the result of macro-expansion.
This case is much more involved, since it seems it is too late for the Fab to inspect the internal state of the macro-processor. So the preprocessor should scan its output itself: it has the full access to the De-Interlacer/Parser/etc callbacks designed to detect the directives.
WARNING: However, the directives are not designed to work with not-yet-completed buffers. For best result, De-Interlacer should return “the high-water mark”: the offset before which the directives cannot appear — so there is no sense to rescan this part when the buffer is extended.
ERROR REFERENCE
The preprocessor will halt with one of the following error codes if a fatal condition is met.
Parsing and Input Errors (E_Input)
E_SYNTAX_ERROR: A directive does not conform to the grammar.E_FILE_READ_ERROR: An#included file cannot be opened or read.E_CYCLIC_INCLUDE: An#includechain references a file that is already being processed.
Grouping Errors (E_Group)
E_MISMATCHED_END_GROUP: An#end_group <flavor>does not match the currently open group.E_DANGLING_END_GROUP: An#end_groupis found with no matching#start_group.E_INVALID_GROUP_NESTING: An attempt to nest groups in a way forbidden by the Configuration.E_UPLEVEL_TOO_DEEP: Anupleveldirective targets a group or call stack level that does not exist.
Configuration Errors (E_Config)
E_TYPE_UNDECLARED: A type-specific operation is used on a key whose type has not been declared.E_TYPE_MISMATCH: A list-specific operation is attempted on a scalar key, or vice-versa.E_LONG_POP: Popping too many elements.
Namespace and Assembly Errors (E_Assembly)
E_CYCLIC_INHERITANCE: An#set_parentsdirective creates a loop in the inheritance DAG.E_UNDEFINED_PARENT: An#set_parentsdirective refers to a non-existent namespace.E_ROOT_NOT_FOUND: A section specified via#set_rootdoes not exist.E_NO_ROOTS_SPECIFIED: The assembly phase is triggered, but no roots were ever defined.E_EMBED_NOT_FOUND: A section referenced by an#embplaceholder cannot be resolved.E_CIRCULAR_EMBED: An#embchain results in a loop during the final assembly.
Control Flow Errors (E_Control)
E_MISMATCHED_CONTROL_END: An end directive (e.g.,#endif) was encountered, but the currently open control block is of a different type (e.g., a#forloop).E_DANGLING_CONTROL_END: An end directive was encountered when the Control Stack was empty.E_UNCLOSED_CONTROL_BLOCK: The end of an input file was reached while one or more control blocks were still open.E_POINTER_NOT_FOUND: Acontrol__jump_tocall referenced a pointer name that has not been defined in the target control block.E_BLOCK_LABEL_NOT_FOUND: Acontrol__jump_tocall referenced a block label that does not exist on the Control Stack, and was not found before the end of the enclosing block.Payload:
(target_pointer_name, block_label)
[PLACEHOLDERS]
Future sections will detail the full APIs and default implementations for:
The Parser: How it interacts with the main loop and calls primitives.
Chunk Preprocessors: The interface for the "stomach".
Blinder Subroutines: The data structures passed to them.
The default configuration schema under the
Fab/key.
EXPORT
None by default.
TODO
The error messages are not yet in the described format.
Docs are not cleaned up (mark these by “???”!); and parts are still missing.
The implementation (when it exists) has not been cleaned up yet.
MISS: input__end($n) with $n==0 ending the current input stream, or also $n enclosing streams. Or maybe better end up to a given name of a group?
MISS: need to specify what happens with a group on input__end(): survives / should be closed before / close-if-still-open. (May be not needed if closing is group-controlled.
MISS: input__end_to($group_type): end input of the files opened up to the the opened-latest-group of the given type.
Likewise for “closing enclosed groups”??? Join together into 1 API by introducing suitable groups???
May run forever! Even the language of group__start, group_end, group__postpone (with the arguments to the last command restricted to these 3 calls) is Turing complete — one can implement a 2-counters machine with this.
MISS: _-prefixed versions of config__*() which check the prefix _ __ ___ of the key and make it read/write only.
MISS: pipelining preprocessor with “filters”: filters consists of 3 components: preprocessor, massaging of promises, and the concatenator of the resulting candidates-for-sections.
When undoing, undo data should be guarded, at least up to the exit.
Uplevel w.r.t. the recursion stack should be recalculated into the usual uplevels; a special do-nothing group may be used.
allow not 1 active preprocessor engine, but a stack of them, chained, and have a cfg__pop() primitive method.
Macros are templates marked as OK-to-be-not-executed.
Hash to mark macros as “types”, with per-type-warning configurable? Which way to hash, types to macros, or macros to types, or pairs???
An extra parameter $is_replay for the syntax callbacks for the non-first pass. (Probably, the boundary of the preceding pass cannot be part of this???
hash keys. Setting/getting them employs a serialization method, by default JSON???
uplevel should allow execution AFTER the group is closed
There should be an indicator that a method is called “from” before/after uplevel. If uplevel is called “uplevel”ed, it should only register the requests,; at end of the enclosing processing of the uplevevel-ed stuff, these postponed commands should be executed (reversing order again???).
Should reversing of order be customizable, and in which order one should put the inverted group and the non-inverted group????
Probably better to have more than 2 types, and for every “type” have the pre/post specified, and whether upleveling from the
group reverses the order???
The type is a Unicode string starting with either PRE-, or post-, and sorting as if it is appended by ".000"… with an infinite
number of characters '0' (but before the possible suffix "-inv". Then sorting lexicographically (with actual closing happening
between PRE- and post-, and -inv groups inverted gives the order of execution. (Still: inversion of postponed postpones???)
When sorting, one can literally pad by ".0"* up to the maximal length of the name of existening groups + 1!
Note that "string#" comes before "string" for any value of "string", and different strings “sort differently”!
#unset — but maybe it is going to make undo harder???
Recursive invocation is needed for massaging data between different level:
• The hash
• Input file names
• Input strings
• Output sections
A top→down approach to recognizing the “reconfiguration directives”: but cannot we support a bottom→up approach too, when the preprocessor recognizes interlaced directives? How to deal with recursion: return an indicator that the state changed? But it is changing permanently, one needs to filter out changes — but how the internal calls know the filters needed for the enclosing calls???
the preprocessor needs a guarantee that it is going to be reentered (to postpone operations). A special call at end???
SEE ALSO
Mention other useful documentation such as the documentation of related modules or operating system documentation (such as man pages in UNIX), or any relevant external documentation such as RFCs or standards.
If you have a mailing list set up for your module, mention it here.
If you have a web site set up for your module, mention it here.
AUTHOR
Ilya Zakharevich, <ilyaz@cpan.com>
COPYRIGHT AND LICENSE
Copyright (C) 2025 by Ilya Zakharevich
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.2 or, at your option, any later version of Perl 5 you may have available.