NAME
LaTeXML::Package
- Support for package implementations and document customization.
SYNOPSIS
This package defines and exports most of the procedures users will need to customize or extend LaTeXML. The LaTeXML implementation of some package might look something like the following, but see the installed LaTeXML/Package
directory for realistic examples.
use LaTeXML::Package;
use strict;
#
# Load "anotherpackage"
RequirePackage('anotherpackage');
#
# A simple macro, just like in TeX
DefMacro('\thesection', '\thechapter.\roman{section}');
#
# A constructor defines how a control sequence generates XML:
DefConstructor('\thanks{}', "<ltx:thanks>#1</ltx:thanks>");
#
# And a simple environment ...
DefEnvironment('{abstract}','<abstract>#body</abstract>');
#
# A math symbol \Real to stand for the Reals:
DefMath('\Real', "\x{211D}", role=>'ID');
#
# Or a semantic floor:
DefMath('\floor{}','\left\lfloor#1\right\rfloor');
#
# More esoteric ...
# Use a RelaxNG schema
RelaxNGSchema("MySchema");
# Or use a special DocType if you have to:
# DocType("rootelement",
# "-//Your Site//Your DocType",'your.dtd',
# prefix=>"http://whatever/");
#
# Allow sometag elements to be automatically closed if needed
Tag('prefix:sometag', autoClose=>1);
#
# Don't forget this, so perl knows the package loaded.
1;
DESCRIPTION
To provide a LaTeXML-specific version of a LaTeX package mypackage.sty
or class myclass.cls
(so that eg. \usepackage{mypackage}
works), you create the file mypackage.sty.ltxml
or myclass.cls.ltxml
and save it in the searchpath (current directory, or one of the directories given to the --path option, or possibly added to the variable SEARCHPATHS). Similarly, to provide document-specific customization for, say, mydoc.tex
, you would create the file mydoc.latexml
(typically in the same directory). However, in the first cases, mypackage.sty.ltxml
are loaded instead of mypackage.sty
, while a file like mydoc.latexml
is loaded in addition to mydoc.tex
. In either case, you'll use LaTeXML::Package;
to import the various declarations and defining forms that allow you to specify what should be done with various control sequences, whether there is special treatment of certain document elements, and so forth. Using LaTeXML::Package
also imports the functions and variables defined in LaTeXML::Global, so see that documentation as well.
Since LaTeXML attempts to mimic TeX, a familiarity with TeX's processing model is also helpful. Additionally, it is often useful, when implementing non-trivial behaviour, to think TeX-like.
Many of the following forms take code references as arguments or options. That is, either a reference to a defined sub, \&somesub
, or an anonymous function sub { ... }. To document these cases, and the arguments that are passed in each case, we'll use a notation like CODE($token,..).
Control Sequences
Many of the following forms define the behaviour of control sequences. In TeX you'll typically only define macros. In LaTeXML, we're effectively redefining TeX itself, so we define macros as well as primitives, registers, constructors and environments. These define the behaviour of these commands when processed during the various phases of LaTeX's immitation of TeX's digestive tract.
The first argument to each of these defining forms (DefMacro
, DefPrimive
, etc) is a prototype consisting of the control sequence being defined along with the specification of parameters required by the control sequence. Each parameter describes how to parse tokens following the control sequence into arguments or how to delimit them. To simplify coding and capture common idioms in TeX/LaTeX programming, latexml's parameter specifications are more expressive than TeX's \def
or LaTeX's \newcommand
. Examples of the prototypes for familiar TeX or LaTeX control sequences are:
DefConstructor('\usepackage[]{}',...
DefPrimitive('\multiply Variable SkipKeyword:by Number',..
DefPrimitive('\newcommand OptionalMatch:* {Token}[]{}', ...
Control Sequence Parameters
The general syntax for parameter for a control sequence is something like
OpenDelim? Modifier? Type (: value (| value)* )? CloseDelim?
The enclosing delimiters, if any, are either {} or [], affect the way the argument is delimited. With {}, a regular TeX argument (token or sequence balanced by braces) is read before parsing according to the type (if needed). With [], a LaTeX optional argument is read, delimited by (non-nested) square brackets.
The modifier can be either Optional
or Skip
, allowing the argument to be optional. For Skip
, no argument is contributed to the argument list.
The shorthands {} and [] default the type to Plain
and reads a normal TeX argument or LaTeX default argument with no special parsing.
The general syntax for parameter specification is
{} reads a regular TeX argument, a sequence of
tokens delimited by braces, or a single token.
{spec} reads a regular TeX argument, then reparses it
to match the given spec. The spec is parsed
recursively, but usually should correspond to
a single argument.
[spec] reads an LaTeX-style optional argument. If the
spec is of the form Default:stuff, then stuff
would be the default value.
Type Reads an argument of the given type, where either
Type has been declared, or there exists a ReadType
function accessible from LaTeXML::Package::Pool.
Type:value, or Type:value1:value2... These forms
pass additional Tokens to the reader function.
OptionalType Similar to Type, but it is not considered
an error if the reader returns undef.
SkipType Similar to OptionalType, but the value returned
from the reader is ignored, and does not occupy a
position in the arguments list.
The predefined argument types are as follows.
Plain
,Semiverbatim
-
Reads a standard TeX argument being either the next token, or if the next token is an {, the balanced token list. In the case of
Semiverbatim
, many catcodes are disabled, which is handy for URL's, labels and similar. Token
,XToken
-
Read a single TeX Token. For
XToken
, if the next token is expandable, it is repeatedly expanded until an unexpandable token remains, which is returned. Number
,Dimension
,Glue
orMuGlue
-
Read an Object corresponding to Number, Dimension, Glue or MuGlue, using TeX's rules for parsing these objects.
Until:
match,XUntil:
match-
Reads tokens until a match to the tokens match is found, returning the tokens preceding the match. This corresponds to TeX delimited arguments. For
XUntil
, tokens are expanded as they are matched and accumulated. UntilBrace
-
Reads tokens until the next open brace
{
. This corresponds to the peculiar TeX construct\def\foo#{...
. Match:
match(|match)*,Keyword:
match(|match)*-
Reads tokens expecting a match to one of the token lists match, returning the one that matches, or undef. For
Keyword
, case and catcode of the matches are ignored. Additionally, any leading spaces are skipped. Balanced
-
Read tokens until a closing }, but respecting nested {} pairs.
BalancedParen
-
Read a parenthesis delimited tokens, but does not balance any nested parentheses.
Undigested
,Digested
,DigestUntil:
match-
These types alter the usual sequence of tokenization and digestion in separate stages (like TeX). A
Undigested
parameter inhibits digestion completely and remains in token form. ADigested
parameter gets digested until the (required) opening { is balanced; this is useful when the content would usually need to have been protected in order to correctly deal with catcodes.DigestUntil
digests tokens until a token matching match is found. Variable
-
Reads a token, expanding if necessary, and expects a control sequence naming a writable register. If such is found, it returns an array of the corresponding definition object, and any arguments required by that definition.
SkipSpaces
,Skip1Space
-
Skips one, or any number of, space tokens, if present, but contributes nothing to the argument list.
Control of Scoping
Most defining commands accept an option to control how the definition is stored, scope=>$scope
, where $scope
can be c<'global'> for global definitions, 'local'
, to be stored in the current stack frame, or a string naming a scope. A scope saves a set of definitions and values that can be activated at a later time.
Particularly interesting forms of scope are those that get automatically activated upon changes of counter and label. For example, definitions that have scope=>'section:1.1'
will be activated when the section number is "1.1", and will be deactivated when the section ends.
Macros
DefMacro($prototype,$string | $tokens | $code,%options);
-
Defines the macro expansion for
$prototype
; a macro control sequence that is expanded during macro expansion time (in the LaTeXML::Core::Gullet). If a$string
is supplied, it will be tokenized at definition time. Any macro arguments will be substituted for parameter indicators (eg #1) at expansion time; the result is used as the expansion of the control sequence.If defined by
$code
, the form isCODE($gullet,@args)
and it must return a list of LaTeXML::Core::Token's.DefMacro options are
- scope=>$scope
-
See "Control of Scoping".
- locked=>boolean
-
Whether this definition is locked out of changes in the TeX sources.
Examples:
DefMacro('\thefootnote','\arabic{footnote}'); DefMacro('\today',sub { ExplodeText(today()); });
DefMacroI($cs,$paramlist,$string | $tokens | $code,%options);
-
Internal form of
DefMacro
where the control sequence and parameter list have already been separated; useful for definitions from within code. Also, slightly more efficient for macros with no arguments (useundef
for$paramlist
), and useful for obscure cases like defining\begin{something*}
as a Macro.
Conditionals
DefConditional($prototype,$test,%options);
-
Defines a conditional for
$prototype
; a control sequence that is processed during macro expansion time (in the LaTeXML::Core::Gullet). A conditional corresponds to a TeX\if
. It evaluates$test
, which should be CODE that is applied to the arguments, if any. Depending on whether the result of that evaluation returns a true or false value (in the usual Perl sense), the result of the expansion is either the first or else code following, in the usual TeX sense.DefConditional options are
- scope=>$scope
-
See "Control of Scoping".
- locked=>boolean
-
Whether this definition is locked out of changes in the TeX sources.
Example:
DefConditional('\ifmmode',sub { LookupValue('IN_MATH'); });
DefConditionalI($cs,$paramlist,$test,%options);
-
Internal form of
DefConditional
where the control sequence and parameter list have already been parsed; useful for definitions from within code. Also, slightly more efficient for conditinal with no arguments (useundef
for$paramlist
).
Primitives
DefPrimitive($prototype,$replacement,%options);
-
Define a primitive control sequence; a primitive is processed during digestion (in the LaTeXML::Core::Stomach), after macro expansion but before Construction time. Primitive control sequences generate Boxes or Lists, generally containing basic Unicode content, rather than structured XML. Primitive control sequences are also executed for side effect during digestion, effecting changes to the LaTeXML::Core::State.
The
$replacement
is either a string, used as the Boxes text content (the box gets the current font), orCODE($stomach,@args)
, which is invoked at digestion time, probably for side-effect, but returning Boxes or Lists.$replacement
may also be undef, which contributes nothing to the document, but does record the TeX code that created it.DefPrimitive options are
- mode=>(text|display_math|inline_math)
-
Changes to this mode during digestion.
- bounded=>boolean
-
If true, TeX grouping (ie.
{}
) is enforced around this invocation. - requireMath=>boolean,
- forbidMath=>boolean
-
These specify whether the given constructor can only appear, or cannot appear, in math mode.
- font=>{fontspec...}
-
Specifies the font to use (see "MergeFont(%style);"). If the font change is to only apply to material generated within this command, you would also use
<bounded=
1>>; otherwise, the font will remain in effect afterwards as for a font switching command. - beforeDigest=>CODE($stomach)
-
This option supplies a Daemon to be executed during digestion just before the main part of the primitive is executed. The CODE should either return nothing (return;) or a list of digested items (Box's,List,Whatsit). It can thus change the State and/or add to the digested output.
- afterDigest=>CODE($stomach)
-
This option supplies a Daemon to be executed during digestion just after the main part of the primitive ie executed. it should either return nothing (return;) or digested items. It can thus change the State and/or add to the digested output.
- scope=>$scope
-
See "Control of Scoping".
- locked=>boolean
-
Whether this definition is locked out of changes in the TeX sources.
isPrefix=>1
-
Indicates whether this is a prefix type of command; This is only used for the special TeX assignment prefixes, like
\global
.
Example:
DefPrimitive('\begingroup',sub { $_[0]->begingroup; });
DefPrimitiveI($cs,$paramlist,CODE($stomach,@args),%options);
-
Internal form of
DefPrimitive
where the control sequence and parameter list have already been separated; useful for definitions from within code. DefRegister($prototype,$value,%options);
-
Defines a register with the given initial value (a Number, Dimension, Glue, MuGlue or Tokens --- I haven't handled Box's yet). Usually, the
$prototype
is just the control sequence, but registers are also handled by prototypes like\count{Number}
.DefRegister
arranges that the register value can be accessed when a numeric, dimension, ... value is being read, and also defines the control sequence for assignment.Options are
readonly
-
specifies if it is not allowed to change this value.
getter
=>CODE(@args)setter
=>CODE($value,@args)-
By default the value is stored in the State's Value table under a name concatenating the control sequence and argument values. These options allow other means of fetching and storing the value.
Example:
DefRegister('\pretolerance',Number(100));
DefRegisterI($cs,$paramlist,$value,%options);
-
Internal form of
DefRegister
where the control sequence and parameter list have already been parsed; useful for definitions from within code.
Constructors
DefConstructor($prototype,$xmlpattern | $code,%options);
-
The Constructor is where LaTeXML really starts getting interesting; invoking the control sequence will generate an arbitrary XML fragment in the document tree. More specifically: during digestion, the arguments will be read and digested, creating a LaTeXML::Core::Whatsit to represent the object. During absorbtion by the LaTeXML::Core::Document, the
Whatsit
will generate the XML fragment according to the replacement$xmlpattern
, or by executingCODE
.The
$xmlpattern
is simply a bit of XML as a string with certain substitutions to be made. The substitutions are of the following forms:If code is supplied, the form is
CODE($document,@args,%properties)
- #1, #2 ... #name
-
These are replaced by the corresponding argument (for #1) or property (for #name) stored with the Whatsit. Each are turned into a string when it appears as in an attribute position, or recursively processed when it appears as content.
&function(@args)
-
Another form of substituted value is prefixed with
&
which invokes a function. For example,&func(#1)
would invoke the functionfunc
on the first argument to the control sequence; what it returns will be inserted into the document. ?COND(pattern)
or?COND(ifpattern)(elsepattern)
-
Patterns can be conditionallized using this form. The
COND
is any of the above expressions, considered true if the result is non-empty. Thus?#1(<foo/>)
would add the empty elementfoo
if the first argument were given. ^
-
If the constuctor begins with
^
, the XML fragment is allowed to float up to a parent node that is allowed to contain it, according to the Document Type.
The Whatsit property
font
is defined by default. Additional propertiesbody
andtrailer
are defined whencaptureBody
is true, or for environments. By using$whatsit->setProperty(key=>$value);
withinafterDigest
, or by using theproperties
option, other properties can be added.DefConstructor options are
- mode=>(text|display_math|inline_math)
-
Changes to this mode during digestion.
- bounded=>boolean
-
If true, TeX grouping (ie.
{}
) is enforced around this invocation. - requireMath=>boolean,
- forbidMath=>boolean
-
These specify whether the given constructor can only appear, or cannot appear, in math mode.
- font=>{fontspec...}
-
Specifies the font to use (see "MergeFont(%style);"). If the font change is to only apply to material generated within this command, you would also use
<bounded=
1>>; otherwise, the font will remain in effect afterwards as for a font switching command. - reversion=>$texstring or CODE($whatsit,#1,#2,...)
-
Specifies the reversion of the invocation back into TeX tokens (if the default reversion is not appropriate). The $textstring string can include #1,#2... The CODE is called with the $whatsit and digested arguments and must return a list of Token's.
- properties=>{prop=>value,...} or CODE($stomach,#1,#2...)
-
This option supplies additional properties to be set on the generated Whatsit. In the first form, the values can be of any type, but if a value is a code references, it takes the same args ($stomach,#1,#2,...) and should return the value; it is executed before creating the Whatsit. In the second form, the code should return a hash of properties.
- beforeDigest=>CODE($stomach)
-
This option supplies a Daemon to be executed during digestion just before the Whatsit is created. The CODE should either return nothing (return;) or a list of digested items (Box's,List,Whatsit). It can thus change the State and/or add to the digested output.
- afterDigest=>CODE($stomach,$whatsit)
-
This option supplies a Daemon to be executed during digestion just after the Whatsit is created (and so the Whatsit already has its arguments and properties). It should either return nothing (return;) or digested items. It can thus change the State, modify the Whatsit, and/or add to the digested output.
- beforeConstruct=>CODE($document,$whatsit)
-
Supplies CODE to execute before constructing the XML (generated by $replacement).
- afterConstruct=>CODE($document,$whatsit)
-
Supplies CODE to execute after constructing the XML.
- captureBody=>boolean or Token
-
if true, arbitrary following material will be accumulated into a `body' until the current grouping level is reverted, or till the
Token
is encountered if the option is aToken
. This body is available as thebody
property of the Whatsit. This is used by environments and math. - alias=>$control_sequence
-
Provides a control sequence to be used when reverting Whatsit's back to Tokens, in cases where it isn't the command used in the
$prototype
. - nargs=>$nargs
-
This gives a number of args for cases where it can't be infered directly from the
$prototype
(eg. when more args are explictly read by Daemons). - scope=>$scope
-
See "Control of Scoping".
DefConstructorI($cs,$paramlist,$xmlpattern | $code,%options);
-
Internal form of
DefConstructor
where the control sequence and parameter list have already been separated; useful for definitions from within code. DefMath($prototype,$tex,%options);
-
A common shorthand constructor; it defines a control sequence that creates a mathematical object, such as a symbol, function or operator application. The options given can effectively create semantic macros that contribute to the eventual parsing of mathematical content. In particular, it generates an XMDual using the replacement $tex for the presentation. The content information is drawn from the name and options
These
DefConstructor
options also apply:reversion, alias, beforeDigest, afterDigest, beforeConstruct, afterConstruct and scope.
Additionally, it accepts
- style=>astyle
-
adds a style attribute to the object.
- name=>aname
-
gives a name attribute for the object
- omcd=>cdname
-
gives the OpenMath content dictionary that name is from.
- role=>grammatical_role
-
adds a grammatical role attribute to the object; this specifies the grammatical role that the object plays in surrounding expressions. This direly needs documentation!
- font=>{fontspec}
-
Specifies the font to use (see "MergeFont(%style);").
- mathstyle=(display|text|inline)
-
Controls whether the this object will be presented in a specific mathstyle, or according to the current setting of
mathstyle
. - scriptpos=>(mid|post)
-
Controls the positioning of any sub and super-scripts relative to this object; whether they be stacked over or under it, or whether they will appear in the usual position. TeX.pool defines a function
doScriptpos()
which is useful for operators like\sum
in that it sets tomid
position when in displaystyle, otherwisepost
. - stretchy=>boolean
-
Whether or not the object is stretchy when displayed.
- operator_role=>grammatical_role
- operator_scriptpos=>boolean
- operator_stretchy=>boolean
-
These three are similar to
role
,scriptpos
andstretchy
, but are used in unusual cases. These apply to the given attributes to the operator token in the content branch. - nogroup=>boolean
-
Normally, these commands are digested with an implicit grouping around them, localizing changes to fonts, etc;
noggroup=>1
inhibits this.Example:
DefMath('\infty',"\x{221E}", role=>'ID', meaning=>'infinity');
DefMathI($cs,$paramlist,$tex,%options);
-
Internal form of
DefMath
where the control sequence and parameter list have already been separated; useful for definitions from within code. DefEnvironment($prototype,$replacement,%options);
-
Defines an Environment that generates a specific XML fragment.
$replacement
is of the same form as for DefConstructor, but will generally include reference to the#body
property. Upon encountering a\begin{env}
: the mode is switched, if needed, else a new group is opened; then the environment name is noted; the beforeDigest daemon is run. Then the Whatsit representing the begin command (but ultimately the whole environment) is created and the afterDigestBegin daemon is run. Next, the body will be digested and collected until the balancing\end{env}
. Then, any afterDigest daemon is run, the environment is ended, finally the mode is ended or the group is closed. The body and\end{env}
whatsit are added to the\begin{env}
's whatsit as body and trailer, respectively.It shares options with
DefConstructor
:mode, requireMath, forbidMath, properties, nargs, font, beforeDigest, afterDigest, beforeConstruct, afterConstruct and scope.
Additionally,
afterDigestBegin
is effectively anafterDigest
for the\begin{env}
control sequence.Example:
DefConstructor('\emph{}', "<ltx:emph>#1</ltx:emph", mode=>'text');
DefEnvironment gives slightly different interpretation to some of
DefConstructor
's options and adds some new ones:- beforeDigest=>CODE($stomach)
-
This option is the same as for
DefConstructor
, but it applies to the\begin{environment}
control sequence. - afterDigestBegin=>CODE($stomach,$whatsit)
-
This option is the same as
DefConstructor
'safterDigest
but it applies to the\begin{environment}
control sequence. The Whatsit is the one for the begining control sequence, but represents the environment as a whole. Note that although the arguments and properties are present in the Whatsit, the body of the environment is not. - beforeDigestEnd=>CODE($stomach)
-
This option is the same as
DefConstructor
'sbeforeDigest
but it applies to the\end{environment}
control sequence. - afterDigest=>CODE($stomach,$whatsit)
-
This option is the same as
DefConstructor
'safterDigest
but it applies to the\end{environment}
control sequence. Note, however that the Whatsit is only for the ending control sequence, not the Whatsit for the environment as a whole. - afterDigestBody=>CODE($stomach,$whatsit)
-
This option supplies a Daemon to be executed during digestion after the ending control sequence has been digested (and all the 4 other digestion Daemons have executed) and after the body of the environment has been obtained. The Whatsit is the (usefull) one representing the whole environment, and it now does have the body and trailer available, stored as a properties.
DefEnvironmentI($name,$paramlist,$replacement,%options);
-
Internal form of
DefEnvironment
where the control sequence and parameter list have already been separated; useful for definitions from within code.
Inputing Content and Definitions
FindFile($name,%options);
-
Find an appropriate file with the given
$name
in the current directories inSEARCHPATHS
. If a file ending with.ltxml
is found, it will be preferred.Note that if the
$name
starts with a recognized protocol (currently one of(literal|http|https|ftp)
) followed by a colon, the name is returned, as is, and no search for files is carried out.The options are:
- type=>type
-
specifies the file type. If not set, it will search for both
$name.tex
and$name
. - noltxml=>1
-
inhibits searching for a LaTeXML binding to use instead of the file itself (
$name.$type.ltxml
) - notex=>1
-
inhibits searching for raw tex version of the file. That is, it will only search for the LaTeXML binding.
InputContent($request,%options);
-
InputContent
is used for cases when the file (or data) is plain TeX material that is expected to contribute content to the document (as opposed to pure definitions). A Mouth is opened onto the file, and subsequent reading and/or digestion will pull Tokens from that Mouth until it is exhausted, or closed.In some circumstances it may be useful to provide a string containing the TeX material explicitly, rather than referencing a file. In this case, the
literal
pseudo-protocal may be used:InputContent('literal:\textit{Hey}');
If a file named
$request.latexml
exists, it will be read in as if it were a latexml binding file, before processing. This can be used for adhoc customization of the conversion of specific files, without modifying the source, or creating more elaborate bindings.The only option to
InputContent
is: Input($request);
-
Input
is analogous to LaTeX's\input
, and is used in cases where it isn't completely clear whether content or definitions is expected. Once a file is found, the approach specified by InputContent or InputDefinitions is used, depending on which type of file is found. InputDefinitions($request,%options);
-
InputDefinitions
is used for loading definitions, ie. various macros, settings, etc, rather than document content; it can be used to load LaTeXML's binding files, or for reading in raw TeX definitions or style files. It reads and processes the material completely before returning, even in the case of TeX definitions. This procedure optionally supports the conventions used for standard LaTeX packages and classes (see RequirePackage and LoadClass).Options for
InputDefinitions
are:- type=>$type
-
the file type to search for.
- noltxml=>boolean
-
inhibits searching for a LaTeXML binding; only raw TeX files will be sought and loaded.
- notex=>boolean
-
inhibits searching for raw TeX files, only a LaTeXML binding will be sought and loaded.
- noerror=>boolean
-
inhibits reporting an error if no appropriate file is found.
The following options are primarily useful when
InputDefinitions
is supporting standard LaTeX package and class loading.- withoptions=boolean
-
indicates whether to pass in any options from the calling class or package.
- handleoptions=boolean
-
indicates whether options processing should be handled.
- options=>[...]
-
specifies a list of options to be passed (possibly in addition to any provided by the calling class or package).
- after
-
provides code or tokens to be processed by a
$name.$type-hook
macro. - as_class
-
fishy option that indicates that this definitions file should be treated as if it were defining a class; typically shows up in latex compatibility mode, or AMSTeX.
Class and Packages
RequirePackage($package,%options);
-
Finds and loads a package implementation (usually
*.sty.ltxml
, unlessraw
is specified) for the required$package
. It returns the pathname of the loaded package. The options are:- type=>type
-
specifies the file type (default
sty
. - options=>[...]
-
specifies a list of package options.
- noltxml=>1
-
inhibits searching for the LaTeXML binding for the file (ie.
$name.$type.ltxml
- notex=>1
-
inhibits searching for raw tex version of the file. That is, it will only search for the LaTeXML binding.
LoadClass($class,%options);
-
Finds and loads a class definition (usually
*.cls.ltxml
). It returns the pathname of the loaded class. The only option is LoadPool($pool,%options);
-
Loads a pool file, one of the top-level definition files, such as TeX, LaTeX or AMSTeX. It returns the pathname of the loaded file.
DeclareOption($option,$code);
-
Declares an option for the current package or class. The
$code
can be a string or Tokens (which will be macro expanded), or can be a code reference which is treated as a primitive.If a package or class wants to accomodate options, it should start with one or more
DeclareOptions
, followed byProcessOptions()
. PassOptions($name,$ext,@options);
-
Causes the given
@options
(strings) to be passed to the package (if$ext
issty
) or class (if$ext
iscls
) named by$name
. ProcessOptions();
-
Processes the options that have been passed to the current package or class in a fashion similar to LaTeX. If the keyword
inorder=>1
is given, the options are processed in the order they were used, likeProcessOptions*
. ExecuteOptions(@options);
-
Process the options given explicitly in
@options
. AtBeginDocument(@stuff);
-
Arranges for
@stuff
to be carried out after the preamble, at the beginning of the document.@stuff
should typically be macro-level stuff, but carried out for side effect; it should be tokens, tokens lists, strings (which will be tokenized), or a sub (which presumably contains code as would be in a package file, such asDefMacro
or similar.This operation is useful for style files loaded with
--preload
or document specific customization files (ie. ending with.latexml
); normally the contents would be executed before LaTeX and other style files are loaded and thus can be overridden by them. By deferring the evaluation to begin-document time, these contents can override those style files. This is likely to only be meaningful for LaTeX documents.
Counters and IDs
NewCounter($ctr,$within,%options);
-
Defines a new counter, like LaTeX's \newcounter, but extended. It defines a counter that can be used to generate reference numbers, and defines \the$ctr, etc. It also defines an "uncounter" which can be used to generate ID's (xml:id) for unnumbered objects.
$ctr
is the name of the counter. If defined,$within
is the name of another counter which, when incremented, will cause this counter to be reset. The options areidprefix Specifies a prefix to be used to generate ID's when using this counter nested Not sure that this is even sane.
$num = CounterValue($ctr);
-
Fetches the value associated with the counter
$ctr
. $tokens = StepCounter($ctr);
-
Analog of
\stepcounter
, steps the counter and returns the expansion of\the$ctr
. Usually you should useRefStepCounter($ctr)
instead. $keys = RefStepCounter($ctr);
-
Analog of
\refstepcounter
, steps the counter and returns a hash containing the keysrefnum=
$refnum, id=>$id>. This makes it suitable for use in aproperties
option to constructors. Theid
is generated in parallel with the reference number to assist debugging. $keys = RefStepID($ctr);
-
Like to
RefStepCounter
, but only steps the "uncounter", and returns only the id; This is useful for unnumbered cases of objects that normally get both a refnum and id. ResetCounter($ctr);
-
Resets the counter
$ctr
to zero. GenerateID($document,$node,$whatsit,$prefix);
-
Generates an ID for nodes during the construction phase, useful for cases where the counter based scheme is inappropriate. The calling pattern makes it appropriate for use in Tag, as in Tag('ltx:para',afterClose=>sub { GenerateID(@_,'p'); })
If
$node
doesn't already have an xml:id set, it computes an appropriate id by concatenating the xml:id of the closest ancestor with an id (if any), the prefix (if any) and a unique counter.
Document Model
Constructors define how TeX markup will generate XML fragments, but the Document Model is used to control exactly how those fragments are assembled.
Tag($tag,%properties);
-
Declares properties of elements with the name
$tag
. Note thatTag
can set or add properties to any element from any binding file, unlike the properties set on control byDefPrimtive
,DefConstructor
, etc.. And, since the properties are recorded in the current Model, they are not subject to TeX grouping; once set, they remain in effect until changed or the end of the document.The
$tag
can be specified in one of three forms:prefix:name matches specific name in specific namespace prefix:* matches any tag in the specific namespace; * matches any tag in any namespace.
There are two kinds of properties:
- Scalar properties
-
For scalar properties, only a single value is returned for a given element. When the property is looked up, each of the above forms is considered (the specific element name, the namespace, and all elements); the first defined value is returned.
The recognized scalar properties are:
- autoOpen=>boolean
-
Specifies whether this $tag can be automatically opened if needed to insert an element that can only be contained by $tag. This property can help match the more SGML-like LaTeX to XML.
- autoClose=>boolean
-
Specifies whether this $tag can be automatically closed if needed to close an ancestor node, or insert an element into an ancestor. This property can help match the more SGML-like LaTeX to XML.
- Code properties
-
These properties provide a bit of code to be run at the times of certain events associated with an element. All the code bits that match a given element will be run, and since they can be added by any binding file, and be specified in a random orders, a little bit of extra control is desirable.
Firstly, any early codes are run (eg
afterOpen:early
), then any normal codes (without modifier) are run, and finally any late codes are run (eg.afterOpen:late
).Within each of those groups, the codes assigned for an element's specific name are run first, then those assigned for its package and finally the generic one (
*
); that is, the most specific codes are run first.When code properties are accumulated by
Tag
for normal or late events, the code is appended to the end of the current list (if there were any previous codes added); for early event, the code is prepended.The recognized code properties are:
afterOpen=>CODE($document,$box)
-
Provides CODE to be run whenever a node with this $tag is opened. It is called with the document being constructed, and the initiating digested object as arguments. It is called after the node has been created, and after any initial attributes due to the constructor (passed to openElement) are added.
afterOpen:early
orafterOpen:late
can be used in place ofafterOpen
; these will be run as a group bfore, or after (respectively) the unmodified blocks. afterClose=>CODE($document,$box)
-
Provides CODE to be run whenever a node with this $tag is closed. It is called with the document being constructed, and the initiating digested object as arguments.
afterClose:early
orafterClose:late
can be used in place ofafterClose
; these will be run as a group bfore, or after (respectively) the unmodified blocks.
RelaxNGSchema($schemaname);
-
Specifies the schema to use for determining document model. You can leave off the extension; it will look for
.rng
, and maybe eventually,.rnc
once that is implemented. RegisterNamespace($prefix,$URL);
-
Declares the
$prefix
to be associated with the given$URL
. These prefixes may be used in ltxml files, particularly for constructors, xpath expressions, etc. They are not necessarily the same as the prefixes that will be used in the generated document Use the prefix#default
for the default, non-prefixed, namespace. (See RegisterDocumentNamespace, as well as DocType or RelaxNGSchema). RegisterDocumentNamespace($prefix,$URL);
-
Declares the
$prefix
to be associated with the given$URL
used within the generated XML. They are not necessarily the same as the prefixes used in code (RegisterNamespace). This function is less rarely needed, as the namespace declarations are generally obtained from the DTD or Schema themselves Use the prefix#default
for the default, non-prefixed, namespace. (See DocType or RelaxNGSchema). DocType($rootelement,$publicid,$systemid,%namespaces);
-
Declares the expected rootelement, the public and system ID's of the document type to be used in the final document. The hash
%namespaces
specifies the namespaces prefixes that are expected to be found in the DTD, along with each associated namespace URI. Use the prefix#default
for the default namespace (ie. the namespace of non-prefixed elements in the DTD).The prefixes defined for the DTD may be different from the prefixes used in implementation CODE (eg. in ltxml files; see RegisterNamespace). The generated document will use the namespaces and prefixes defined for the DTD.
A related capability is adding commands to be executed at the beginning and end of the document
AtBeginDocument($tokens,...)
-
adds the
$tokens
to a list to be processed just after\\begin{document}
. These tokens can be used for side effect, or any content they generate will appear as the first children of the document (but probably after titles and frontmatter). AtEndDocument($tokens,...)
-
adds the
$tokens
to the list to be processed just before\\end{document}
. These tokens can be used for side effect, or any content they generate will appear as the last children of the document.
Document Rewriting
During document construction, as each node gets closed, the text content gets simplfied. We'll call it applying ligatures, for lack of a better name.
DefLigature($regexp,%options);
-
Apply the regular expression (given as a string: "/fa/fa/" since it will be converted internally to a true regexp), to the text content. The only option is
fontTest=CODE($font)
; if given, then the substitution is applied only whenfontTest
returns true.Predefined Ligatures combine sequences of "." or single-quotes into appropriate Unicode characters.
DefMathLigature(CODE($document,@nodes));
-
CODE is called on each sequence of math nodes at a given level. If they should be replaced, return a list of
($n,$string,%attributes)
to replace the text content of the first node with$string
content and add the given attributes. The next$n-1
nodes are removed. If no replacement is called for, CODE should return undef.Predefined Math Ligatures combine letter or digit Math Tokens (XMTok) into multicharacter symbols or numbers, depending on the font (non math italic).
After document construction, various rewriting and augmenting of the document can take place.
DefRewrite(%specification);
DefMathRewrite(%specification);
-
These two declarations define document rewrite rules that are applied to the document tree after it has been constructed, but before math parsing, or any other postprocessing, is done. The
%specification
consists of a seqeuence of key/value pairs with the initial specs successively narrowing the selection of document nodes, and the remaining specs indicating how to modify or replace the selected nodes.The following select portions of the document:
- label =>$label
-
Selects the part of the document with label=$label
- scope =>$scope
-
The $scope could be "label:foo" or "section:1.2.3" or something similar. These select a subtree labelled 'foo', or a section with reference number "1.2.3"
- xpath =>$xpath
-
Select those nodes matching an explicit xpath expression.
- match =>$TeX
-
Selects nodes that look like what the processing of $TeX would produce.
- regexp=>$regexp
-
Selects text nodes that match the regular expression.
The following act upon the selected node:
Mid-Level support
$tokens = Expand($tokens);
-
Expands the given
$tokens
according to current definitions. $boxes = Digest($tokens);
-
Processes and digestes the
$tokens
. Any arguments needed by control sequences in$tokens
must be contained within the$tokens
itself. @tokens = Invocation($cs,@args);
-
Constructs a sequence of tokens that would invoke the token
$cs
on the arguments. RawTeX('... tex code ...');
-
RawTeX is a convenience function for including chunks of raw TeX (or LaTeX) code in a Package implementation. It is useful for copying portions of the normal implementation that can be handled simply using macros and primitives.
Let($token1,$token2);
-
Gives
$token1
the same `meaning' (definition) as$token2
; like TeX's \let. StartSemiVerbatim(); ... ; EndSemiVerbatim();
-
Disable disable most TeX catcodes.
$tokens = Tokenize($string);
-
Tokenizes the
$string
using the standard catcodes, returning a LaTeXML::Core::Tokens. $tokens = TokenizeInternal($string);
-
Tokenizes the
$string
according to the internal cattable (where @ is a letter), returning a LaTeXML::Core::Tokens.
Argument Readers
ReadParameters($gullet,$spec);
-
Reads from
$gullet
the tokens corresponding to$spec
(a Parameters object). DefParameterType($type,CODE($gullet,@values),%options);
-
Defines a new Parameter type,
$type
, with CODE for its reader.Options are:
- reversion=>CODE($arg,@values);
-
This CODE is responsible for converting a previously parsed argument back into a sequence of Token's.
- optional=>boolean
-
whether it is an error if no matching input is found.
- novalue=>boolean
-
whether the value returned should contribute to argument lists, or simply be passed over.
- semiverbatim=>boolean
-
whether the catcode table should be modified before reading tokens.
DefColumnType($proto,$expansion);
-
Defines a new column type for tabular and arrays.
$proto
is the prototype for the pattern, analogous to the pattern used for other definitions, except that macro being defined is a single character. The$expansion
is a string specifying what it should expand into, typically more verbose column specification. DefKeyVal($keyset,$key,$type,$default);
-
Defines a keyword
$key
used in keyval arguments for the set$keyset
. If type is given, it defines the type of value that must be supplied, such as'Dimension'
. If$default
is given, that value will be used when$key
is used without an equals and explicit value.
Access to State
$value = LookupValue($name);
-
Lookup the current value associated with the the string
$name
. AssignValue($name,$value,$scope);
-
Assign $value to be associated with the the string
$name
, according to the given scoping rule.Values are also used to specify most configuration parameters (which can therefor also be scoped). The recognized configuration parameters are:
VERBOSITY : the level of verbosity for debugging output, with 0 being default. STRICT : whether errors (eg. undefined macros) are fatal. INCLUDE_COMMENTS : whether to preserve comments in the source, and to add occasional line number comments. (Default true). PRESERVE_NEWLINES : whether newlines in the source should be preserved (not 100% TeX-like). By default this is true. SEARCHPATHS : a list of directories to search for sources, implementations, etc.
PushValue($name,@values);
-
This function, along with the next three are like
AssignValue
, but maintain a global list of values.PushValue
pushes the provided values onto the end of a list. The data stored for$name
is global and must be a LIST reference; it is created if needed. UnshiftValue($name,@values);
-
Similar to
PushValue
, but pushes a value onto the front of the list. The data stored for$name
is global and must be a LIST reference; it is created if needed. PopValue($name);
-
Removes and returns the value on the end of the list named by
$name
. The data stored for$name
is global and must be a LIST reference. Returnsundef
if there is no data in the list. ShiftValue($name);
-
Removes and returns the first value in the list named by
$name
. The data stored for$name
is global and must be a LIST reference. Returnsundef
if there is no data in the list. LookupMapping($name,$key);
-
This function maintains a hash association named by
$name
. It returns the value associated with$key
within that mapping. The data stored for$name
is global and must be a HASH reference. Returnsundef
if there is no data associated with$key
in the mapping, or the mapping is not (yet) defined. AssignMapping($name,$key,$value);
-
This function associates
$value
with$key
within the mapping named by$name
. The data stored for$name
is global and must be a HASH reference; it is created if needed. $value = LookupCatcode($char);
-
Lookup the current catcode associated with the the character
$char
. AssignCatcode($char,$catcode,$scope);
-
Set
$char
to have the given$catcode
, with the assignment made according to the given scoping rule.This method is also used to specify whether a given character is active in math mode, by using
math:$char
for the character, and using a value of 1 to specify that it is active. $meaning = LookupMeaning($token);
-
Looks up the current meaning of the given
$token
which may be a Definition, another token, or the token itself if it has not otherwise been defined. $defn = LookupDefinition($token);
-
Looks up the current definition, if any, of the
$token
. InstallDefinition($defn);
-
Install the Definition
$defn
into$STATE
under its control sequence.
Font Encoding
DeclareFontMap($name,$map,%options);
-
Declares a font map for the encoding
$name
. The map$map
is an array of 128 or 256 entries, each element is either a unicode string for the representation of that codepoint, or undef if that codepoint is not supported by this encoding. The only option currently isfamily
used because some fonts (notably cmr!) have different glyphs in some font families, such asfamily=
'typewriter'>. FontDecode($code,$encoding,$implicit);
-
Returns the unicode string representing the given codepoint
$code
(an integer) in the given font encoding$encoding
. If$encoding
is undefined, the usual case, the current font encoding and font family is used for the lookup. Explicit decoding is used when\\char
or similar are invoked ($implicit
is false), and the codepoint must be represented in the fontmap, otherwise undef is returned. Implicit decoding (ie.$implicit
is true) occurs within the Stomach when a Token's content is being digested and converted to a Box; in that case only the lower 128 codepoints are converted; all codepoints above 128 are assumed to already be Unicode.The font map for
$encoding
is automatically loaded if it has not already been loaded. FontDecodeString($string,$encoding,$implicit);
-
Returns the unicode string resulting from decoding the individual characters in
$string
according to FontDecode, above. LoadFontMap($encoding);
-
Finds and loads the font map for the encoding named
$encoding
, if it hasn't been loaded before. It looks forencoding.fontmap.ltxml
, which would typically define the font map usingDeclareFontMap
, possibly including extra maps for families liketypewriter
.
Color
$color=LookupColor($name);
-
Lookup the color object associated with
$name
. DefColor($name,$color,$scope);
-
Associates the
$name
with the given$color
(a color object), with the given scoping. DefColorModel($model,$coremodel,$tocore,$fromcore);
-
Defines a color model
$model
that is derived from the core color model$coremodel
. The two functions$tocore
and$fromcore
convert a color object in that model to the core model, or from the core model to the derived model. Core models are rgb, cmy, cmyk, hsb and gray.
Low-level Functions
CleanID($id);
-
Cleans an
$id
of disallowed characters, trimming space. CleanLabel($label,$prefix);
-
Cleans a
$label
of disallowed characters, trimming space. The prefix$prefix
is prepended (orLABEL
, if none given). CleanIndexKey($key);
-
Cleans an index key, so it can be used as an ID.
CleanBibKey($key);
-
Cleans a bibliographic citation key, so it can be used as an ID.
CleanURL($url);
-
Cleans a url.
UTF($code);
-
Generates a UTF character, handy for the the 8 bit characters. For example,
UTF(0xA0)
generates the non-breaking space. MergeFont(%style);
-
Set the current font by merging the font style attributes with the current font. The attributes and likely values (the values aren't required to be in this set):
family : serif, sansserif, typewriter, caligraphic, fraktur, script series : medium, bold shape : upright, italic, slanted, smallcaps size : tiny, footnote, small, normal, large, Large, LARGE, huge, Huge color : any named color, default is black
Some families will only be used in math. This function returns nothing so it can be easily used in beforeDigest, afterDigest.
@tokens = roman($number);
-
Formats the
$number
in (lowercase) roman numerals, returning a list of the tokens. @tokens = Roman($number);
-
Formats the
$number
in (uppercase) roman numerals, returning a list of the tokens.
AUTHOR
Bruce Miller <bruce.miller@nist.gov>
COPYRIGHT
Public domain software, produced as part of work done by the United States Government & not subject to copyright in the US.