NAME
XS::Parse::Keyword
- XS functions to assist in parsing keyword syntax
DESCRIPTION
This module provides some XS functions to assist in writing syntax modules that provide new perl-visible syntax, primarily for authors of keyword plugins using the PL_keyword_plugin
hook mechanism. It is unlikely to be of much use to anyone else; and highly unlikely to be any use when writing perl code using these. Unless you are writing a keyword plugin using XS, this module is not for you.
This module is also currently experimental, and the design is still evolving and subject to change. Later versions may break ABI compatibility, requiring changes or at least a rebuild of any module that depends on it.
XS FUNCTIONS
boot_xs_parse_keyword
void boot_xs_parse_keyword(double ver);
Call this function from your BOOT
section in order to initialise the module and parsing hooks.
ver should either be 0 or a decimal number for the module version requirement; e.g.
boot_xs_parse_keyword(0.01);
register_xs_parse_keyword
void register_xs_parse_keyword(const char *keyword,
const struct XSParseKeywordHooks *hooks, void *hookdata);
This function installs a set of parsing hooks to be associated with the given keyword. Such a keyword will then be handled automatically by a keyword parser installed by XS::Parse::Keyword
itself.
PARSE HOOKS
The XSParseKeywordHooks
structure provides the following hook stages, which are invoked in the given order.
flags
The following flags are defined:
XPK_FLAG_EXPR
-
The parse or build function is expected to return
KEYWORD_PLUGIN_EXPR
. XPK_FLAG_STMT
-
The parse or build function is expected to return
KEYWORD_PLUGIN_STMT
.These two flags are largely for the benefit of giving static information at registration time to assist static parsing or other related tasks to know what kind of grammatical element this keyword will produce.
XPK_FLAG_AUTOSEMI
-
The syntax forms a complete statement, which should be followed by a statement separator semicolon (
;
). This semicolon is optional at the end of a block.The semicolon, if present, will be consumed automatically.
The permit
Stage
const char *permit_hintkey;
bool (*permit) (pTHX_ void *hookdata);
Called by the installed keyword parser hook which is used to handle keywords registered by "register_xs_parse_keyword".
As a shortcut for the common case, the permit_hintkey
may point to a string to look up from the hints hash. If the given key name is not found in the hints hash then the keyword is not permitted. If the key is present then the permit
function is invoked as normal.
If not rejected by a hint key that was not found in the hints hash, the function part of the stage is called next and should inspect whether the keyword is permitted at this time perhaps by inspecting other lexical clues, and return true only if the keyword is permitted.
Both the string and the function are optional. Either or both may be present. If neither is present then the keyword is always permitted - which is likely not what you wanted to do.
The check
Stage
void (*check)(pTHX_ void *hookdata);
Invoked once the keyword has been permitted. If present, this hook function can check the surrounding lexical context, state, or other information and throw an exception if it is unhappy that the keyword should apply in this position.
The parse
Stage
This stage is invoked once the keyword has been checked, and actually parses the incoming text into an optree. It is implemented by calling the first of the following function pointers which is not NULL. The invoked function may optionally build an optree to represent the parsed syntax, and place it into the variable addressed by out
. If it does not, then a simple OP_NULL
will be constructed in its place.
lex_read_space()
is called both before and after this stage is invoked, so in many simple cases the hook function itself does not need to bother with it.
int (*parse)(pTHX_ OP **out, void *hookdata);
If present, this should consume text from the parser buffer by invoking lex_*
or parse_*
functions and eventually return a KEYWORD_PLUGIN_*
result value.
This is the most generic and powerful of the options, but requires the most amount of implementation work.
int (*build)(pTHX_ OP **out, XSParseKeywordPiece *args[], size_t nargs, void *hookdata);
If parse
is not present, this is called instead after parsing a sequence of arguments, of types given by the pieces field; which should be a zero- terminated array of piece types.
This alternative is somewhat less generic and powerful than providing parse
yourself, but involves much less parsing work and is shorter and easier to implement.
int (*build1)(pTHX_ OP **out, XSParseKeywordPiece arg0, void *hookdata);
If neither parse
nor build
are present, this is called as a simpler variant of build
when only a single argument is required. It takes its type from the piece1
field instead.
PIECES AND PIECE TYPES
When using the build
or build1
alternatives for the parse
phase, the actual syntax is parsed automatically by this module, according to the specification given by the pieces or piece1 field. The result of that parsing step is placed into the args or arg0 parameter to the invoked function, using a union
type consisting of the following fields:
typedef union {
OP *op;
CV *cv;
SV *sv;
int i;
struct {
SV *name;
SV *value;
} attr;
} XSParseKeywordPiece;
Which field is set depends on the type of the piece.
Some piece types are "atomic", whose definition is self-contained. Others are structural, defined in terms of inner pieces. Together these form an entire tree-shaped definition of the syntax that the keyword expects to find.
Atomic types generally provide exactly one argument into the list of args (with the exception of literal matches, which do not provide anything). Structural types may provide an initial argument themselves, followed by a list of the values of each sub-piece they contained inside them. Thus, while the data structure defining the syntax shape is a tree, the argument values it parses into is passed as a flat array to the build
function.
Some structural types need to be able to determine whether or not syntax relating some optional part of them is present in the incoming source text. In this case, the pieces relating to those optional parts must support "probing". This ability is also noted below.
The type of each piece should be one of the following macro values. Some macros additionally take a set of typeflags; taken from the following list:
XPK_TYPEFLAG_SCOPED
On
XPK_BLOCK_flags
, this will wrap the returned optree in its own lexical scope, by callingop_scope()
.XPK_TYPEFLAG_G_VOID, XPK_TYPEFLAG_G_SCALAR, XPK_TYPEFLAG_G_ARRAY
On optree-returning types, will contextualize the returned optree to put it in the given context, by calling
op_contextualize()
.
Where both a plain and a _flags
-suffixed version of the macro exists, the flags version will take the flags in an additional argument, and the non-flags version will pass zero extra flags.
XPK_BLOCK
atomic, emits op.
XPK_BLOCK
XPK_BLOCK_flags(flags)
A brace-delimited block of code is expected, passed as an optree in the op field. This will be parsed as a block within the current function scope.
Permits the flags XPK_TYPEFLAG_SCOPED
and XPK_TYPEFLAG_G_*
.
XPK_BLOCK_SCALARCTX, XPK_BLOCK_LISTCTX
Shortcuts for XPK_BLOCK_flags()
which wrap a scalar or list-context scope around the block.
XPK_ANONSUB
atomic, emits op.
A brace-delimited block of code is expected, and assembled into the body of a new anonymous subroutine. This will be passed as a protosub CV in the cv field.
XPK_TERMEXPR
atomic, emits op.
XPK_TERMEXPR
XPK_TERMEXPR_flags(flags)
A term expression is expected, parsed using parse_termexpr()
, and passed as an optree in the op field.
Permits the flags XPK_TYPEFLAG_G_*
XPK_TERMEXPR_SCALARCTX
A shortcut for XPK_TERMEXPR_flags()
which puts the expression in scalar context.
XPK_LISTEXPR
atomic, emits op.
XPK_LISTEXPR
XPK_LISTEXPR_flags(flags)
A list expression is expected, parsed using parse_listexpr()
, and passed as an optree in the op field.
Permits the flags XPK_TYPEFLAG_G_*
XPK_LISTEXPR_LISTCTX
A shortcut for XPK_LISTEXPR_flags()
which puts the expression in list context.
XPK_IDENT
atomic, emits sv.
A bareword identifier name is expected, and passed as an SV containing a PV in the sv field. An identifier is not permitted to contain a double colon (::
).
XPK_PACKAGENAME
atomic, emits sv.
A bareword package name is expected, and passed as an SV containing a PV in the sv field. A package name is similar to an identifier, except it permits double colons in the middle.
XPK_LEXVARNAME
atomic, emits sv.
XPK_LEXVARNAME(kind)
A lexical variable name is expected, and passed as an SV containing a PV in the sv field. The kind
argument specifies what kinds of variable are permitted, and should be a bitmask of one or more bits from XPK_LEXVAR_SCALAR
, XPK_LEXVAR_ARRAY
and XPK_LEXVAR_HASH
. A convenient shortcut XPK_LEXVAR_ANY
permits all three.
XPK_ATTRIBUTES
atomic, emits i followed by more args.
A list of :
-prefixed attributes is expected, in the same format as sub or variable attributes. An optional leading :
indicates the presence of attributes, then one or more of them are parsed. Attributes may be optionally separated by additional :
s, but this is not required.
Each attribute is expected to be an identifier name, followed by an optional value wrapped in parentheses. Whitespace is NOT permitted between the name and value, as per standard Perl parsing rules.
:attrname
:attrname(value)
The i field indicates how many attributes were found. That number of additional arguments are then passed, each containing two SVs in the attr.name and attr.value fields. This number may be zero.
It is not an error for there to be no attributes present, or for the optional colon to be missing. In this case i will be set to zero.
XPK_VSTRING
atomic, can probe, emits sv.
A version string is expected, of the form v1.234
including the leading v
character. It is passed as a version SV object in the sv field.
XPK_VSTRING_OPT
Identical to XPK_VSTRING
except it is optional; if no version string is found then sv is set to NULL
.
XPK_COLON
atomic, can probe, emits nothing.
A literal colon character (:
) is expected. No argument value is passed.
XPK_EQUALS
atomic, can probe, emits nothing.
A literal equals character (=
) is expected. No argument value is passed.
XPK_STRING
atomic, can probe, emits nothing.
XPK_STRING("literal")
A literal string match is expected. No argument value is passed.
This form should generally be avoided if at all possible, because it is very easy to abuse to make syntaxes which confuse humans and code tools alike. Generally it is best reserved just for the first component of a XPK_OPTIONAL
or XPK_REPEATED
sequence, to provide a "secondary keyword" that such a repeated item can look out for.
XPK_OPTIONAL
structural, emits i.
XPK_OPTIONAL(pieces ...)
A structural type which may expects to find its contained pieces, or is happy not to. This will pass an argument whose i field contains either 1 or 0, depending whether the contents were found. The first piece type within must support probe.
XPK_REPEATED
structural, emits i.
XPK_REPEATED(pieces ...)
A structural type which expects to find zero or more repeats of its contained pieces. This will pass an argument whose i field contains the count of the number of repeats it found. The first piece type within must support probe.
XPK_CHOICE
structural, emits i.
XPK_CHOICE(options ...)
A structural type which expects to find one of a number of alternative options. An ordered list of types is provided, all of which must support probe. This will pass an argument whose i field gives the index of the first choice that was accepted. The first option takes the value 0.
It is not an error if no choice matches. At that point, the i field will be set to -1.
If you require a failure message in this case, set the final choice to be of type XPK_FAILURE
. This will cause an error message to be printed instead.
XPK_FAILURE("message string")
XPK_TAGGEDCHOICE
structural, emits i.
XPK_TAGGEDCHOICE(choice, tag, ...)
A structural type similar to XPK_CHOICE
, except that each choice type is followed by an element of type XPK_TAG
which gives an integer. It is that integer value, rather than the positional index of the choice within the list, which is passed in the i field.
XPK_TAG(value)
XPK_COMMALIST
structural, emits i.
A structural type which expects to find one or more repeats of its contained pieces, separated by literal comma (,
) characters. This is somewhat similar to XPK_REPEATED
, except that it needs at least one copy, needs commas between its items, but does not require that the first contained piece support probe (the comma itself is sufficient to indicate a repeat).
XPK_PARENSCOPE
structural, emits nothing.
XPK_PARENSCOPE(pieces ...)
A structural type which expects to find a sequence of pieces, all contained in parentheses as ( ... )
. This will pass no extra arguments.
XPK_BRACKETSCOPE
structural, emits nothing.
XPK_BRACKETSCOPE(pieces ...)
A structural type which expects to find a sequence of pieces, all contained in square brackets as [ ... ]
. This will pass no extra arguments.
XPK_BRACESCOPE
structural, emits nothing.
XPK_BRACESCOPE(pieces ...)
A structural type which expects to find a sequence of pieces, all contained in braces as { ... }
. This will pass no extra arguments.
Note that this is not necessary to use with XPK_BLOCK
or XPK_ANONSUB
; those will already consume a set of braces. This is intended for special constrained syntax that should not just accept an arbitrary block.
XPK_CHEVRONSCOPE
structural, emits nothing.
XPK_CHEVRONSCOPE(pieces ...)
A structural type which expects to find a sequence of pieces, all contained in angle brackets as < ... >
. This will pass no extra arguments.
Remember that expressions like a > b
are valid term expressions, so the contents of this scope shouldn't allow arbitrary expressions or the closing bracket will be ambiguous.
AUTHOR
Paul Evans <leonerd@leonerd.org.uk>