NAME

PerlPoint::Parser - a PerlPoint Parser

VERSION

This manual describes version 0.451.

SYNOPSIS

# load the module:
use PerlPoint::Parser;

# build the parser and run it
# to get intermediate data in @stream
my ($parser)=new PerlPoint::Parser;
$parser->run(
             stream => \@stream,
             files  => \@files,
            );

DESCRIPTION

The PerlPoint format, initially designed by Tom Christiansen, is intended to provide a simple and portable way to generate slides without the need of a proprietary product. Slides can be prepared in a text editor of your choice, generated on any platform where you find perl, and presented by any browser which can render the chosen output format.

To sum it up, PerlPoint Software takes an ASCII text and transforms it into slides written in a certain document description language. This is, by tradition, usually HTML, but you may decide to use another format like XML, SGML, TeX or whatever you want.

Well, this sounds fine, but how to build a translator which transforms ASCII into the output format of your choice? Thats what PerlPoint::Parser is made for. It performs the first translation step by parsing ASCII and transforming it into an intermediate stream format, which can be processed by a subsequently called translator backend. By separating parsing and output generation we get the flexibility to write as many backends as necessary by using the same parser frontend for all translators.

PerlPoint::Parser supports the complete GRAMMAR with exception of certain tags. Tags are supported the most common way: the parser recognizes any tag which is declared by the author of a translator. This way the parser can be used for various flavours of the PerlPoint language without having to be modified. So, if there is a need of a certain new flag, it can quickly be added without any change to PerlPoint::Parser.

The following chapters describe the input format (GRAMMAR) and the generated stream format (STREAM FORMAT). Finally, the class methods are described to show you how to build a parser.

GRAMMAR

This chapter describes how a PerlPoint ASCII slide description has to be formatted to pass PerlPoint::Parser parsers.

Please note that the input format does not completely determine how the output will be designed. The final format depends on the backend which has to be called after the parser to transform its output into a certain document description language. The final appearance depends on the browsers behaviour.

Each PerlPoint document is made of paragraphs.

The paragraphs

All paragraphs start at the beginning of their first line. The first character or string in this line determines which paragraph is recognized.

A paragraph is completed by an empty line (which may contain whitespaces). Exceptions are described.

Carriage returns in paragraphs which are completed by an empty line are transformed into a whitespace.

Comments

start with "//" and reach until the end of the line.

Headlines

start with one or more "=" characters. The number of "=" characters represents the headline level.

=First level headline

==Second level headline

===Multi
  line
 headline
example

It is possible to declare a "short version" of the headline title by appending a "~" and plain strings to the headline like in

=Very long headlines are expressive but may exceed the
 available space for example in HTML navigation bars or
 something like that ~ Long headlines

The "~" often stands for similarity, or represents the described object in encyclopedias or dictionaries. So one may think of this as "long title is (sometimes) similar to short title".

Lists

Points or unordered lists start with a "*" character.

* This is a first point.

* And, I forgot,
  there is something more to point out.

There are ordered lists as well, and they start with a hash sign ("#"):

# First, check the number of this.

# Second, don't forget the first.

The hash signs are intended to be replaced by numbers by a backend.

Because PerlPoint works on base of paragraphs, any paragraph different to an ordered list point closes an ordered list. If you wish the list to be continued use a double hash sign in case of the single one in the point that reopens the list.

# Here the ordered list begins.

? $includeMore

## This is point 2 of the list that started before.

# In subsequent points, the usual single hash sign
  works as expected again.

List continuation works list level specific (see below for level details). A list cannot be continued in another chapter. Using "##" in the first point of a new list takes no special effect: the list will begin as usual (with number 1).

Definition lists are a third list variant. Each item starts with the described phrase enclosed by a pair of colons, followed by the definition text:

:first things: are usually described first,

:others:       later then.

All lists can be nested. A new level is introduced by a special paragraph called "list indention" which starts with a ">". A list level can be terminated by a "list indention stop" paragraph starting with a "<" character. (These startup characters symbolize "level shifts".)

* First level.

* Still there.

>

* A list point of the 2nd level.

<

* Back on first level.

It is possible to shift more than one level by adding a number. There should be no whitespace between the level shift character and the level number.

* First level.

>

* Second level.

>

* Third level.

<2

* Back on first level.

Level shifts are accepted between list items only.

Please note that there is no need to shift levels back if a list is completed. Any non list paragraph will reset list indentation, as well as the end of the source.

Texts

are paragraphs like points but begin immediately without a startup character:

This is a simple text.

In this new text paragraph,
we demonstrate the multiline feature.

Optionally, a text paragraph can be started with a special character as well, which is a dot:

.This is a simple text with dot.

.In this new text paragraph,
we demonstrate the multiline feature.

This is intended to be used by generators which translate other formats into PerlPoint, to make sure the first character of a paragraph has no special meaning to the PerlPoint parser.

Blocks

are intended to contain examples or code with tag recognition. This means that the parser will discover embedded tags. On the other hand, it means that one may have to escape ">" characters embedded into tags. Blocks begin with an indentation and are completed by the next empty line.

* Look at these examples:

    A block.

    \I<Another> block.
    Escape ">" in tags: \C<<\>>.

Examples completed.

Subsequent blocks are joined together automatically: the intermediate empty lines which would usually complete a block are translated into real empty lines within the block. This makes it easier to integrate real code sequences as one block, regardless of the empty lines included. However, one may explicitly wish to separate subsequent blocks and can do so by delimiting them by a special control paragraph:

* Separated subsequent blocks:

    The first block.

-

    The second block.

Note that the control paragraph starts at the left margin.

Verbatim blocks

are similar to blocks in indentation but deactivate pattern recognition. That means the embedded text is not scanned for tags and empty lines and may therefore remain as it was in its original place, possibly a script.

These special blocks need a special syntax. They are implemented as here documents. Start with a here document clause flagging which string will close the "here document":

<<EOC

  PerlPoint knows various
  tags like \B, \C and \I. # unrecognized tags

EOC
Tables

are supported as well, they start with an @ sign which is followed by the column delimiter:

@|
 column 1   |   column 2   |  column 3
  aaa       |    bbb       |   ccc
  uuu       |    vvvv      |   www

The first line is automatically marked as a "table headline". Most converters emphasize such headlines by bold formatting, so there is no need to insert \B tags into the document.

If a table row contains less columns than the table headline, the "missed" columns are automatically added. This is,

@|
A | B | C
1
1 |
1 | 2
1 | 2 |
1 | 2 | 3

is streamed exactly like

@|
A | B | C
1 |   |
1 |   |
1 | 2 |
1 | 2 |
1 | 2 | 3

to make backend handling easier. (Empty HTML table cells, for example, are rendered slightly obscure by certain browsers unless they are filled with invisible characters, so a converter to HTML can detect such cells because of normalization and handle them appropriately.)

Please note that normalization refers to the headline row. If another line contains more columns than the headline, normalization does not care. If the maximum column number is detected in another row, a warning is issued. (As a help for converter authors, the title and maximum column number are made part of a table tag as internal options __titleColumns__ and __maxColumns__.)

In all tables, leading and trailing whitespaces of a cell are automatically removed, so you can use as many of them as you want to improve the readability of your source. The following table is absolutely equivalent to the last example:

@|
A                |       B         |      C
1                |                 |
 1               |                 |
  1              | 2               |
   1             |  2              |
    1            | 2               |      3

There is also a more sophisticated way to describe tables, see the tag section below.

Note: Although table paragraphs cannot be nested, tables declared by tag possibly can (and might be embedded into table paragraphs as well). To help converter authors handling nested tables, the opening table tag provides an internal option "__nestingLevel__".

Conditions

start with a "?" character. If active contents is enabled, the paragraph text is evaluated as Perl code. The (boolean) evaluation result then determines if subsequent PerlPoint is read and parsed. If the result is false, all subsequent paragraphs until the next condition are skipped.

Note that base data is made available by a global (package) hash reference $PerlPoint. See run() for details about how to set up these data.

Conditions can be used to maintain various language versions of a presentation in one source file:

? $PerlPoint->{targetLanguage} eq 'German'

Or you could enable parts of your document by date:

? time>$dateOfTalk

or by a special setting:

? flagSet('setting')

Please note that the condition code shares its variables with embedded and included code.

To make usage easier and to improve readability, condition code is evaluated with disabled warnings (the language variable in the example above may not even been set).

Converter authors might want to provide predefined variables such as "$language" in the example.

Note: If a document uses document streams, be careful in intermixing docstream entry points and conditions. A condition placed in a skipped document stream will not e evaluated. A document stream entry point placed in a source area hidden by a false condition will not be reconized.

Variable assignment paragraphs

Variables can be used in the text and will be automatically replaced by their string values (if declared).

The next paragraph sets a variable.

$var=var

This variable is called $var.

All variables are made available to embedded and included Perl code as well as to conditions and can be accessed there as package variables of "main::" (or whatever package name the Safe object is set up to). Because a variable is already replaced by the parser if possible, you have to use the fully qualified name or to guard the variables "$" prefix character to do so:

\EMBED{lang=perl}join(' ', $main::var, \$var)\END_EMBED

Variable modifications by embedded or included Perl do not affect the variables visible to the parser. (This is true for conditions as well.) This means that

$var=10
\EMBED{lang=perl}$main::var*=2;\END_EMBED

causes $var to be different on parser and code side - the parser will still use a value of 10, while embedded code works on with a value of 20.

Macro or alias definitions

Sometimes certain text parts are used more than once. It would be a relieve to have a shortcut instead of having to insert them again and again. The same is true for tag combinations a user may prefer to use. That's what aliases (or "macros") are designed for. They allow a presentation author to declare his own shortcuts and to use them like a tag. The parser will resolve such aliases, replace them by the defined replacement text and work on with this replacement.

An alias declaration starts with a "+" character followed immediately by the alias name (without backslash prefix), optionally followed immediately by an option default list in "{}", followed immediately by a colon. (No additional spaces here.)

All text after this colon up to the paragraph closing empty line is stored as the replacement text. So, whereever you will use the new macro, the parser will replace it by this text and reparse the result. This means that your macro text can contain any valid constructions like tags or other macros.

The replacement text may contain strings embedded into doubled underscores like __this__. This is a special syntax to mark that the macro takes parameters of these names (e.g. this). If a macro is used and these parameters are set, their values will replace the mentioned placeholders. The special placeholder "__body__" is used to mark where the macro body is to place.

If a macro is used and defined options are unset, but there are defaults for them in the optional default list, these defaults will be used for the respective options.

Here are a few examples:

+RED:\FONT{color=red}<__body__>

+F:\FONT{color=__c__}<__body__>

+COLORED{c=blue}:\FONT{color=__c__}<__body__>

+IB:\B<\I<__body__>>

This \IB<text> is \RED<colored>.

Defaults: first, text in \COLORED{c=red}<Red>,
now text in \COLORED<Blue>.

+TEXT:Macros can be used to abbreviate longer
   texts as well as other tags
or tag combinations.

+HTML:\EMBED{lang=html}

Tags can be \RED<\I<nested>> into macros.
And \I<\F{c=blue}<vice versa>>.
\IB<\RED<This>> is formatted by nested macros.
\HTML This is <i>embedded HTML</i>\END_EMBED.

Please note: \TEXT

If no parameter is defined in the macro definition, options will not be recognized. The same is true for the body part. Unless __body__ is used in the macro definition, macro bodies will not be recognized. This means that with the definition

+OPTIONLESS:\B<__body__>

the construction

\OPTIONLESS{something=this}<more>

is evaluated as a usage of \OPTIONLESS without body, followed by the string {something=here}. Likewise, the definition

+BODYLESS:found __something__

causes

\BODYLESS{something=this}<more>

to be recognized as a usage of \BODYLESS with option something, followed by the string <more>. So this will be resolved as found this. Finally,

+JUSTTHENAME:Text phrase.

enforces these constructions

... \JUSTTHENAME, ...
... \JUSTTHENAME{name=Name}, ...
... \JUSTTHENAME<text>, ...
... \JUSTTHENAME{name=Name}<text> ...

to be translated into

... Text phrase. ...
... Text phrase.{name=Name} ...
... Text phrase.<text>, ...
... Text phrase.{name=Name}<text> ...

The principle behind all this is to make macro usage easier and intuative: why think of options or a body or of special characters possibly treated as option/body part openers unless the macro makes use of an option or body?

An empty macro text undefines the macro (if it was already known).

// undeclare the IB alias
+IB:

An alias can be used like a tag.

Aliases named like a tag overwrite the tag (as long as they are defined).

Document stream entry points

A document stream is a "document in document" and best explained by example.

Consider a document talking about
two scripts and comparing them. A
typical review of this type is
structured this way: headline, notes
about script 1, notes about script 2,
new headline to discuss another aspect,
notes about script 1, notes about
script 2, and so on.

Everything said about item 1 is a document stream, everything about object 2 as well. and a third stream is implicitly built by all parts outside these two. In slide construction, each stream can have its own area, for example

-------------------------------------
|                                   |
|            main stream            |
|                                   |
-------------------------------------
|                 |                 |
|  item 1 stream  |  item 2 stream  |
|                 |                 |
-------------------------------------

But to construct a layout like this, streams need to be distinguished, and that is what "stream entry points" are made for.

A stream entry point starts with a "~" character, followed by a string which is the name of the stream. This may be an internal name only, or converters may turn it into a document part as well. The __ALL__ string is reserved for internal purposes. It is recommended to treat __MAIN__ as reserved as well, although it has no special meaning yet.

Once an entry point was passed, all subsequent document parts belong to the declared stream, up to the next entry point or a headline which implicitly switches back to the "main stream".

The parser can be instructed to ignore certain streams, see run() for details. If this feature is used, please be careful in intermixing stream entry points and conditions. A condition placed in a skipped document stream will not be evaluated.

It is up to a converter how document streams are used. Certain converters may ignore them at all. As a convenient solution, the parser can be instructed to transform stream entry points into headlines (one level below the current real headline level). See run() for details.

Tags

Tags are directives embedded into the text stream, commanding how certain parts of the text should be interpreted. Tags are declared by using one or more modules build on base of PerlPoint::Tags.

use PerlPoint::Tags::Basic;

PerlPoint::Parser parsers can recognize all tags which are build of a backslash and a number of capitals and numbers.

\TAG

Tag options are optional and follow the tag name immediately, enclosed by a pair of corresponding curly braces. Each option is a simple string assignment. The value has to be quoted if /^\w+$/ does not match it.

\TAG{par1=value1 par2="www.perl.com" par3="words and blanks"}

The tag body is anything you want to make the tag valid for. It is optional as well and immediately follows the optional parameters, enclosed by "<" and ">":

\TAG<body>
\TAG{par=value}<body>

Tags can be nested.

To provide a maximum of flexibility, tags are declared outside the parser. This way a translator programmer is free to implement the tags he needs. It is recommended to always support the basic tags declared by PerlPoint::Tags::Basic. On the other hand,a few tags of special meaning are reserved and cannot be declared by converter authors, because they are handled by the parser itself. These are:

\INCLUDE

It is possible to include a file into the input stream. Have a look:

\INCLUDE{type=HTML file=filename}

This imports the file "filename". The file contents is made part of the generated stream, but not parsed. This is useful to include target language specific, preformatted parts.

If, however, the file type is specified as "PP", the file contents is made part of the input stream and parsed. In this case a special tag option "headlinebase" can be specified to define a headline base level used as an offset to all headlines in the included document. This makes it easier to share partial documents with others, or to build complex documents by including separately maintained parts, or to include one and the same part at different headline levels.

Example: If "\INCLUDE{type=PP file=file headlinebase=20}" is
         specified and "file" contains a one level headline
         like "=Main topic of special explanations"
         this headline is detected with a level of 21.

Pass the special keyword "CURRENT_LEVEL" to this tag option if you want to set just the current headline level as an offset. This results in "subchapters".

Example:

===Headline 3

// let included chapters start on level 4
\INCLUDE{type=PP file=file headlinebase=CURRENT_LEVEL}

Similar to "CURRENT_LEVEL", "BASE_LEVEL" sets the current base headline level as an offset. The "base level" is the level above the current one. Using "BASE_LEVEL" results in parallel chapters.

Example:

===Headline 3

// let included chapters start on level 3
\INCLUDE{type=PP file=file headlinebase=BASE_LEVEL}

A given offset is reset when the included document is parsed completely.

A second special option smart commands the parser to include the file only unless this was already done before. This is intended for inclusion of pure alias/macro definition or variable assignment files.

\INCLUDE{type=PP file="common-macros.pp" smart=1}

Included sources may declare variables of their own, possibly overwriting already assigned values. Option "localize" works like Perls local(): such changes will be reversed after the nested source will have been processed completely, so the original values will be restored. You can specify a comma separated list of variable names or the special string __ALL__ which flags that all current settings shall be restored.

\INCLUDE{type=PP file="nested.pp" localize=myVar}

\INCLUDE{type=PP file="nested.pp" localize="var1, var2, var3"}

\INCLUDE{type=PP file="nested.pp" localize=__ALL__}

PerlPoint authors can declare an input filter to preprocess the included file. This is done via option ifilter:

\INCLUDE{type=pp file="source.pod" ifilter="pod2pp()"}

An input filter is a snippet of user defined Perl code, taking the included file via @main::_ifilterText and the target type via $main::_ifilterType. The original filename can be accessed via $main::_ifilterType. It should supply its result as an array of strings which will then be processed instead of the original file.

Input filters are Active Content. If Active Content is disabled, \INCLUDE tags using input filters will be ignored completely.

As a simplified option, import allows to use predefined import filters defined in PerlPoint::Import::... modules. To use such a filter do not set the ifilter option, set import instead. import takes the name of the source format, like "POD", or a true number to indicate that the file extension should be used as the source format name. The uppercased name is used as the final part of the filter module - for "POD", the modules name would be "PerlPoint::Import::POD". If this module is installed and has a function importFilter() this function name is used like ifilter.

Here are a few examples:

\INCLUDE{file="source.pod" import=1}

\INCLUDE{file="source.pod" import=pod}

\INCLUDE{file=source import=pod}

Please note that in the last example import=1 will not work, as the source file has no extension that indicates its format is POD.

If ifilter is used together with import, import is ignored.

A PerlPoint file can be included wherever a tag is allowed, but sometimes it has to be arranged slightly: if you place the inclusion directive at the beginning of a new paragraph and your included PerlPoint starts by a paragraph of another type than text, you should begin the included file by an empty line to let the parser detect the correct paragraph type. Here is an example: if the inclusion directive is placed like

// include PerlPoint
\INCLUDE{type=pp file="file.pp"}

and file.pp immediately starts with a verbatim block like

<<VERBATIM
    verbatim
VERBATIM

, the inclusion directive already opens a new paragraph which is detected to be "text" (because there is no special startup character). Now in the included file, from the parsers point of view the included PerlPoint is simply a continuation of this text, because a paragraph ends with an empty line. This trouble can be avoided by beginning the included file by an empty line, so that its first paragraph can be detected correctly.

The second special case is a file type of "Perl". If active contents is enabled, included Perl code is read into memory and evaluated like embedded Perl. The results are made part of the input stream to be parsed.

// execute a perl script and include the results
\INCLUDE{type=perl file="disk-usage.pl"}

As another option, files may be declared to be of type "example" or "parsedexample". This makes the file placed into the source as a verbatim block (with "example"), or a standard block (with "parsedexample"), respectively, without need to copy its contents into the source.

// include an external script as an example
\INCLUDE{type=example file="script.csh"}

All lines of the example file are included as they are but can be indented on request. To do so, just set the special option "indent" to a positive numerical value equal to the number of spaces to be inserted before each line.

// external example source, indented by 3 spaces
\INCLUDE{type=example file="script.csh" indent=3}

Including external scripts this way can accelerate PerlPoint authoring significantly, especially if the included files are still subject to changes.

It is possible to filter the file types you wish to include (with exception of "pp" and "example"), see below for details. In any case, the mentioned file has to exist.

\EMBED and \END_EMBED

Target format code does not necessarily need to be imported - it can be directly embedded as well. This means that one can write target language code within the input stream using \EMBED:

\EMBED{lang=HTML}
This is <i><b>embedded</b> HTML</i>.
The parser detects <i>no</i> PerlPoint
tag here, except of <b>END_EMBED</b>.
\END_EMBED

Because this is handled by tags, not by paragraphs, it can be placed directly in a text like this:

These \EMBED{lang=HTML}<i>italics</i>\END_EMBED
are formatted by HTML code.

Please note that the EMBED tag does not accept a tag body (to avoid ambiguities).

Both tag and embedded text are made part of the intermediate stream. It is the backends task to deal with it. The only exception of this rule is the embedding of Perl code, which is evaluated by the parser. The reply of this code is made part of the input stream and parsed as usual.

PerlPoint authors can declare an input filter to preprocess the embedded text. This is done via option ifilter:

\EMBED{lang=pp ifilter="pod2pp()"}

=head1 POD formatted part

This part was written in POD.

\END_EMBED

An input filter is a snippet of user defined Perl code, taking the embedded text via @main::_ifilterText and the target language via $main::_ifilterType. The original filename can be accessed via $main::_ifilterType (but please note that this is the source with the \EMBED tag). It should supply its result as an array of strings which will then be processed as usual.

Input filters are Active Contents. If Active Contents is disabled, embedded parts using input filters will be ignored completely.

It is possible to filter the languages you wish to embed (with exception of "PP"), see below for details.

\TABLE and \END_TABLE

It was mentioned above that tables can be built by table paragraphs. Well, there is a tag variant of this:

\TABLE{bg=blue separator="|" border=2}
\B<column 1>  |  \B<column 2>  | \B<column 3>
   aaaa       |     bbbb       |  cccc
   uuuu       |     vvvv       |  wwww
\END_TABLE

This is sligthly more powerfull than the paragraph syntax: you can set up several table features like the border width yourself, and you can format the headlines as you like.

As in all tables, leading and trailing whitespaces of a cell are automatically removed, so you can use as many of them as you want to improve the readability of your source.

The default row separator (as in the example above) is a carriage return, so that each table line can be written as a separate source line. However, PerlPoint allows you to specify another string to separate rows by option rowseparator. This allows to specify a table inlined into a paragraph.

\TABLE{bg=blue separator="|" border=2 rowseparator="+++"}
\B<column 1> | \B<column 2> | \B<column 3> +++ aaaa
| bbbb | cccc +++ uuuu | vvvv|  wwww \END_TABLE

This is exactly the same table as above.

If parser option nestedTables is set to a true value calling run(), it is possible to nest tables. To help converter authors handling this, the opening table tag provides an internal option "__nestingLevel__".

Tables built by tag are normalized the same way as table paragraphs are.

What about special formatting?

Earlier versions of pp2html supported special format hints like the HTML expression "&gt;" for the ">" character, or "&uuml;" for "ü". PerlPoint::Parser does not support this directly because such hints are specific to the output format - if someone wants to translate into TeX, it might be curious for him to use HTML syntax in his ASCII text. Further more, such hints can be handled completely by a backend which finds them unchanged in the produced output stream.

The same is true for special headers and trailers. It is a backend task to add them if necessary. The parser does handle the input only.

STREAM FORMAT

It is suggested to use PerlPoint::Backend to evaluate the intermediate format. Nevertheless, here is the documentation of this format.

The generated stream is an array of tokens. Most of them are very simple, representing just their contents - words, spaces and so on. Example:

"These three words."

could be streamed into

"These three" + " "+ "words."

(This shows the principle. Actually this complete sentence would be replied as one token for reasons of effeciency.)

Note that the final dot is part of the last token. From a document description view, this should make no difference, its just a string containing special characters or not.

Well, besides this "main stream", there are formatting directives. They flag the beginning or completion of a certain logical entity - this means a whole document, a paragraph or a formatting like italicising. Almost every entity is embedded into a start and a completion directive - except of simple tokens.

In the current implementation, a directive is a reference to an array of mostly two fields: a directive constant showing which entity is related, and a start or completion hint which is a constant, too. The used constants are declared in PerlPoint::Constants. Directives can pass additional informations by additional fields. By now, the headline directives use this feature to show the headline level, as well as the tag ones to provide tag type information and the document ones to keep the name of the original document. Further more, ordered list points can request a fix number this way.

# this example shows a tag directive
... [DIRECTIVE_TAG, DIRECTIVE_START, "I"]
+ "formatted" + " " + "strings"
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, "I"] ...

To recognize whether a token is a basic or a directive, the ref() function can be used. However, this handling should be done by PerlPoint::Backend transparently. The format may be subject to changes and is documented for information purposes only.

Original line numbers are no part of the stream but can be provided by embedded directives on request, see below for details.

This is the complete generator format. It is designed to be simple but powerful.

METHODS

new()

The constructor builds and prepares a new parser object.

Parameters:

The class name.

Return value: The new object in case of success.

Example:

my ($parser)=new PerlPoint::Parser;

run()

This function starts the parser to process a number of specified files.

Parameters: All parameters except of the object parameter are named (pass them by hash).

activeBaseData

This optional parameter allows to pass common data to all active contents (conditions, embedded and included Perl) by a hash reference. By convention, a translator at least passes the target language and user settings by

activeBaseData => {
                   targetLanguage => "lang",
                   userSettings   => \%userSettings,
                  },

User settings are intended to allow the specification of per call settings by a user, e.g. to include special parts. By using this convention, users can easily specify such a part the following way

? flagSet('setting')

Special part.

? 1

It is up to a translator author to declare translator specific settings (and to document them). The passed values can be as complex as necessary as long as they can be duplicated by Storable::dclone().

Whenever active contents is invoked, the passed hash reference is copied (duplicated by Storable::dclone()) into the Safe objects namespace (see safe) as a global variable $PerlPoint. This way, modifications by invoked code do not effect subsequently called code snippets, base data are always fresh.

activeDataInit

Reserved to pass hook functions to be called preparing every active contents invokation. The hook is still unimplemented.

cache

This optional parameter controls source file paragraph caching.

By default, a source file is parsed completely everytime you pass it to the parser. This is no problem with tiny sources but can delay your work if you are dealing with large sources which have to be translated periodically into presentations while they are written. Typically most of the paragraphs remain unchanged from version to version, but nevertheless everything is usually reparsed which means a waste of time. Well, to improve this a paragraph cache can be activated by setting this option to CACHE_ON.

The parser caches each initial source file individually. That means if three files are passed to the parser with activated caching, three cache files will be written. They are placed in the source file directory, named .<source file>.ppcache. Please note that the paragraphs of included sources are cached in the cache file of the main document because they may have to be evaluated differently depending on inclusion context.

What acceleration can be expected? Well, this strongly depends on your source structure. Efficiency will grow with longer paragraphs, reused paragraphs and paragraph number. It will be reduced by heavy usage of active contents and embedding because every paragraph that refers to parts defined externally is not strongly determined by itself and therefore it cannot be cached. Here is a list of all reasons which cause a paragraph to be excluded from caching:

Embedded parts

Obviously dynamic parts may change from one version to another, but even static parts could have to be interpreted differently because a user can set up new filters.

Included files

An \INCLUDE tag immediately disables caching for the paragraph it resides in because the loaded file may change its contents. This is not really a restriction because the included paragraphs themselves are cached if possible.

Filtered paragraphs

A paragraph filter can transform a source paragraph in whatever the author of a Perl function might think is useful, potentially depending on highly dynamical data. So it cannot be determined by the parser what the final translation of a certain source paragraph will be.

Document stream entry points

Depending on the parsers configuration, these points can be transformed into headlines or remain unchanged, so there is no fixed up mapping between a source paragraph and its streamed expression.

Even with these restrictions about 70% of a real life document of more than 150 paragraphs could be cached. This saved more than 60% of parsing time in subsequent translator calls.

New cache entries are always added which means that old entries are never replaced and a cache file tends to grow. If you ever wish to clean up a cache file completely pass CACHE_CLEANUP to this option.

To deactivate caching explicitly pass CACHE_OFF. An existing cache will not be destroyed.

Settings can be combined by addition.

# clean up the cache, then refill it
cache => CACHE_CLEANUP+CACHE_ON,

# clean up the cache and deactivate it
cache => CACHE_CLEANUP+CACHE_OFF,

The CACHE_OFF value is overwritten by any other setting.

It is suggested to make this setting available to translator users to let them decide if a cache should be used.

Please note that there is a problem with line numbers if paragraphs are restored from cache because of the behaviour of perls paragraph mode. In this mode, the <> operator reads in any number of newlines between paragraphs but supplies only one of them. That is why I do not get the real number of lines in a paragraph and therefore cannot store them. To work around this, two strategies can be used. First, do not use more than exactly one newline between paragraphs. (This strategy is not for real life users, of course, but in this case restored numbers would be correct.) Second, remember that source line numbers are only interesting in error messages. If the parser detects an error, it therefore says: error "there or later" when a cache hit already occured. If the real number is wished the parser could be reinvoked then with deactivated cache and will report it.

Another known paragraph mode problem occurs if you parse on a UNIX system but your document (or parts of it) were written in DOS format. The paragraph mode reads such a document completely. Please replace the line ending character sequences system appropriate. (If you are using dos2unix under Solaris please invoke it with option -ascii to do this.)

More, Perls paragraph mode and PerlPoint treat whitespace lines differently. Because of the way it works, paragraph mode does not recognize them as "empty" while PerlPoint does for reasons of usability (invisible characters should not make a difference). This means that lines containing only whitespaces separate PerlPoint paragraphs but not "Perl" paragraphs, making the cache working wrong especially in examples. If paragraphs unintentionally disappear in the resulting presentation, please check the "empty lines" before them.

Consistent cache data depend on the versions of the parser, of constant declarations and of the module Storable which is used internally. If the parser detects a significant change in one of these versions, existing caches are automatically rebuilt.

Final cache note: cache files are not locked while they are used. If you need this feature please let me know.

criticalSemanticErrors

If set to a true value, semantic errors will cause the parser to terminate immediately. This defaults to false: errors are accumulated and finally reported.

display

This parameter is optional. It controls the display of runtime messages like informations or warnings. By default, all messages are displayed. You can suppress these informations partially or completely by passing one or more of the "DISPLAY_..." variables declared in PerlPoint::Constants. Constants should be combined by addition.

docstreams2skip

by default, all document streams are made part of the result, but by this parameter one can exclude certain streams (all remaining ones will be streamed as usual).

The list should be supplied by an array reference.

It is suggested to take the values of this parameter from a user option, which by convention should be named -skipstream.

docstreaming

specifies the way the parser handles stream entry points. The value passed might be either DSTREAM_DEFAULT, DSTREAM_IGNORE or DSTREAM_HEADLINES.

DSTREAM_HEADLINES instructs the parser to transform the entry points into headlines, one level below the current real headline level. This is an easy to implement and convenient way of docstream handling seems to make sense in most target formats.

DSTREAM_IGNORE hides all streams except of the main stream. The effect is similar to a call with docstreams2skip set for all document streams in a source.

DSTREAM_DEFAULT treats the entry points as entry points and streams them as such. This is the default if the parameter is omitted.

Please note that filters applied by docstream2skip work regardless of the docstreaming configuration which only affects the way the parser passes docstream data to a backend.

It is recommended to take the value of this parameter from a user option, which by convention should be named -docstreaming. (A converter can define various more modes than provided by the parser and implement them itself, of course. See pp2sdf for a reference implementation.)

files

a reference to an array of files to be scanned.

Files are treated as PerlPoint sources except when their name has the prefix IMPORT:, as in IMPORT:podsource.pod. With this prefix, the parser tries to automatically tranform the source into PerlPoint, using a standard import filter for the format indicated by the file extension (pod in our example). The filter must be installed as PerlPoint::Import::<uppercased format name>, e.g. PerlPoint::Import::POD.

filter

a regular expression describing the target language. This setting, if used, prevents all embedded or included source code of other languages than the set one from inclusion into the generated stream. This accelerates both parsing and backend handling. The pattern is evaluated case insensitively.

Example: pass "html|perl" to allow HTML and Perl.

To illustrate this, imagine a translator to PostScript. If it reads a Perl Point file which includes native HTML, this translator cannot handle such code. The backend would have to skip the HTML statements. With a "PostScript" filter, the HTML code will not appear in the stream.

This enables PerlPoint texts prepared for various target languages. If an author really needs plain target language code to be embedded into PerlPoint, he could provide versions for various languages. Translators using a filter will then receive exactly the code of their target language, if provided.

Please note that you cannot filter out PerlPoint code or example files.

By default, no filter is set.

this optional flag causes the parser to register all headline titles as anchors automatically. (Headlines are stored without possibly included tags which are stripped off.)

Registering anchors does \not mean there are anchors included to the stream, it just means that they are known to exist at parsing time because they are added to an internal PerlPoint::Anchor object which is passed to all tag hooks and can be evaluated there. See \PerlPoint::Tags and PerlPoint::Anchors for details.

It is recommended to make use of this feature if your converter automatically makes headlines an anchor named like the headline (this feature was introduced by Lorenz Domkes pp2html initially). (Nevertheless, usefulness may depend on dealing with the parsers anchor collection in tag hooks. See the documentations of used tag modules for details.)

If your converter does not support automatic headline anchors the mentioned way, it is recommended to omit this option because it could confuse tag hooks that evaluate the parsers anchor collection.

libpath

An optional reference to an array of library pathes to be searched for files specified by \INCLUDE tags. This array is intended to be filled by directories specified via an converter option. By convention, this option is named includelib and should be enabled multiple times (converter -includelib path1 -includelib path2 document.pp).

Please note that library pathes can be set via environment variable PERLPOINTLIB as well, but directories specified via libpath are searched first.

linehints

If set to a true value, the parser will embed line hints into the stream whenever a new source line begins.

A line hint directive is provided as

[
 DIRECTIVE_NEW_LINE, DIRECTIVE_START,
 {file=>filename, line=>number}
]

and is suggested to be handled by a backend callback.

Please note that currently source line numbers are not guaranteed to be correct if stream parts are restored from cache (see there for details).

The default value is 0.

nestedTables

This is an optional flag which is by default set to 0, indicating if the parser shall accept nested tables or not. Table nesting can produce very nice results if it is supported by the target language. HTML, for example, allows to nest tables, but other languages do not. So, using this feature can really improve the results if a user is focussed on supporting certain target formats only. If I want to produce nothing but HTML, why should I take care of target formats not able to handle table nesting? On the other hand, if a document shall be translated into several formats, it might cause trouble to nest tables therein.

Because of this, it is suggested to let converter users decide if they want to enable table nesting or not. If the target format does not support nesting, I recommend to disable nesting completely.

object

the parser object made by new();

safe

an object of the Safe class which comes with perl. It is used to evaluate embedded Perl code in a safe environment. By letting the caller of run() provide this object, a translator author can make the level of safety fully configurable by users. Usually, the following should work

use Safe;
...
$parser->run(safe=>new Safe, ...);

Safe is a really good module but unfortunately limited in loading modules transparently. So if a user wants to use modules in his embedded code, he might fail to get it working in a Safe compartment. If safety does not matter, he can decide to execute it without Safe, with full Perl access. To switch on this mode, pass a true scalar value (but no reference) instead of a Safe object.

To make all PerlPoint converters behave similarly, it is recommended to provide two related options -activeContents and -safeOpcode. -activeContents should flag that active contents shall be evaluated, while -safeOpcode controls the level of security. A special level ALL should mean that all code can b executed without any restriction, while any other settings should be treated as an opcode to configure the Safe object. So, the recommended rules are: pass 0 unless -activeContents is set. Pass 1 if the converter was called with -activeContents and -safeOpcode ALL. Pass a Safe object and configure it according to the users -safeOpcode settings if -activeContents is used but without -safeOpcode ALL. See pp2sdf for an implementation example.

Active Perl contents is suppressed if this setting is omitted or if anything else than a Safe object is passed. (There are currently three types of active contents: embedded or included Perl and condition paragraphs.)

predeclaredVars

Variables are usually set by assignment paragraphs. However, it may be useful for a converter to predeclare a set of them to provide certain settings to the users. Predeclared variables, as any other PerlPoint variables, can be used both in pure PerlPoint and in active contents. To help users distinguish them from user defined vars, their names will be capitalized.

Just pass a hash of variable name / value pairs:

$parser->run(
             ...
             predeclaredVars => {
                                 CONVERTER_NAME    => 'pp2xy',
                                 CONVERTER_VERSION => $VERSION,
                                 ...
                                },
            );

Non capitalized variable names will be capitalized without further notice.

Please note that variables currently can only be scalars. Different data types will not be accepted by the parser.

Predeclared variables should be mentioned in the converters documentation.

The parser itself makes use of this feature by declaring _PARSER_VERSION (the version of this module used to parse the source) and _STARTDIR (the full path of the startup directory, as reported by Cwd::cdw()).

predeclaredVars needs var2stream to take effect.

skipcomments

By default comments are streamed and can be converted into comments of the target language. But often they are of limited use in generated files: especially if they are intended to help the author of a document, not the reader of the source of generated results. So with this option one can suppress comments from being streamed.

It is suggested to get this setting via user option, which by convention should be named -skipcomments.

stream

A reference to an array where the generated output stream should be stored in.

Application programmers may want to tie this array if the target ASCII texts are expected to be large (long ASCII texts can result in large stream data which may occupy a lot of memory). Because of the fact that the parser stores stream data by paragraph, memory consumption can be reduced significantly by tying the stream array.

It is recommended to pass an empty array. Stored data will not be overwritten, the parser appends its data instead (by push()).

trace

This parameter is optional. It is intended to activate trace code while the method runs. You may pass any of the "TRACE_..." constants declared in PerlPoint::Constants, combined by addition as in the following example:

# show the traces of both
# lexical and syntactical analysis
trace => TRACE_LEXER+TRACE_PARSER,

If you omit this parameter or pass TRACE_NOTHING, no traces will be displayed.

var2stream

If set to a true value, the parser will propagate variable settings into the stream by adding additional DIRECTIVE_VARSET directives.

A variable propagation has the form

[
 DIRECTIVE_VARSET, DIRECTIVE_START,
 {var=>varname, value=>value}
]

and is suggested to be handled by a backend callback.

The default value is 0.

vispro

activates "process visualization" which simply means that a user will see progress messages while the parser processes documents. The numerical value of this setting determines how often the progress message shall be updated, by a chapter interval:

# inform every five chapters
vispro => 5,

Process visualization is automatically suppressed unless STDERR is connected to a terminal, if this option is omitted, display was set to DISPLAY_NOINFO or parser traces are activated.

Return value: A "true" value in case of success, "false" otherwise. A call is performed successfully if there was neither a syntactical nor a semantic error in the parsed files.

Example:

$parser->run(
             stream => \@streamData,
             files  => \@ARGV,
             filter => 'HTML',
             cache  => CACHE_ON,
             trace  => TRACE_PARAGRAPHS,
            );

anchors()

A class method that supplied all anchors collected by the parser.

Example:

my $anchors=PerlPoint::Parser::anchors;

EXAMPLE

The following code shows a minimal but complete parser.

# pragmata
use strict;

# load modules
use PerlPoint::Parser;

# declare variables
my (@streamData);

# build parser
my ($parser)=new PerlPoint::Parser;
# and call it
$parser->run(
             stream  => \@streamData,
             files   => \@ARGV,
            );

NOTES

Converter namespace

It is suggested to avoid operating in namespace main::. In order to emulate the behaviour of the Safe module by eval() in case a user wishes to get full Perl access for active contents, active contents needs to be executed in this namespace. Safe does not allow to change this, so the documented default for "saved" and "not saved" active contents needs to be main::. This means that both the parser and active contents will pollute main::. Prevent from being effected by choosing a different converter namespace. The PerlPoint::Converter:: hyrarchy is reserved for this purpose. The recommended namespace is PerlPoint::Converter::<converter name>, e.g. PerlPoint::Converter::pp2sdf.

Format

The PerlPoint format was initially designed by Tom Christiansen, who wrote an HTML slide generator for it, too.

Lorenz Domke added a number of additional, useful and interesting features to the original implementation. At a certain point, we decided to redesign the tool to make it a base for slide generation not only into HTML but into various document description languages.

The PerlPoint format implemented by this parser version is slightly different from the original design. Presentations written for Perl Point 1.0 will not pass the parser but can simply be converted into the new format. We designed the new format as a team of Lorenz Domke, Stephen Riehm and me.

Storable updates

From version 0.24 on the Storable module is a prerequisite of the parser package because Storable is used to store and retrieve cache data in files. If you update your Storable installation it might happen that its internal format changes and therefore stored cache data becomes unreadable. To avoid this, the parser automatically rebuilds existing caches in case of Storable updates.

FILES

If caches are used, the parser writes cache files where the initial sources are stored. They are named .<source file>.ppcache.

SEE ALSO

PerlPoint::Backend

A frame class to write backends basing on the STREAM OUTPUT.

PerlPoint::Constants

Constants used by parser functions and in the STREAM FORMAT.

PerlPoint::Tags

Tag declaration base class.

pp2sdf

A reference implementation of a PerlPoint converter, distributed with the parser package.

pp2html

The inital PerlPoint tool designed and provided by Tom Christiansen. A new translator by Lorenz Domke using PerlPoint::Package.

SUPPORT

A PerlPoint mailing list is set up to discuss usage, ideas, bugs, suggestions and translator development. To subscribe, please send an empty message to perlpoint-subscribe@perl.org.

If you prefer, you can contact me via perl@jochen-stenzel.de as well.

AUTHOR

Copyright (c) Jochen Stenzel (perl@jochen-stenzel.de), 1999-2001. All rights reserved.

This module is free software, you can redistribute it and/or modify it under the terms of the Artistic License distributed with Perl version 5.003 or (at your option) any later version. Please refer to the Artistic License that came with your Perl distribution for more details.

The Artistic License should have been included in your distribution of Perl. It resides in the file named "Artistic" at the top-level of the Perl source tree (where Perl was downloaded/unpacked - ask your system administrator if you dont know where this is). Alternatively, the current version of the Artistic License distributed with Perl can be viewed on-line on the World-Wide Web (WWW) from the following URL: http://www.perl.com/perl/misc/Artistic.html.

PerlPoint::Parser is built using Parse::Yapp a way that users have not to explicitly install Parse::Yapp themselves. According to the copyright note of Parse::Yapp I have to mention the following:

"The Parse::Yapp module and its related modules and shell scripts are copyright (c) 1998-1999 Francois Desarmenien, France. All rights reserved.

You may use and distribute them under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file."

DISCLAIMER

This software is distributed in the hope that it will be useful, but is provided "AS IS" WITHOUT WARRANTY OF ANY KIND, either expressed or implied, INCLUDING, without limitation, the implied warranties of MERCHANTABILITY and FITNESS FOR A PARTICULAR PURPOSE.

The ENTIRE RISK as to the quality and performance of the software IS WITH YOU (the holder of the software). Should the software prove defective, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY CREATE, MODIFY, OR DISTRIBUTE THE SOFTWARE BE LIABLE OR RESPONSIBLE TO YOU OR TO ANY OTHER ENTITY FOR ANY KIND OF DAMAGES (no matter how awful - not even if they arise from known or unknown flaws in the software).

Please refer to the Artistic License that came with your Perl distribution for more details.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1399:

Non-ASCII character seen before =encoding in '"ü".'. Assuming CP1252