NAME

PerlPoint::Parser - a PerlPoint Parser

VERSION

This manual describes version 0.30.

SYNOPSIS

# load the module:
use PerlPoint::Parser;

# build the parser and run it
# to get intermediate data in @stream
my ($parser)=new PerlPoint::Parser;
$parser->run(
             stream  => \@stream,
             tags    => \%tags,
             files   => \@files,
            );

DESCRIPTION

The PerlPoint format, initially designed by Tom Christiansen, is intended to provide a simple and portable way to generate slides without the need of a proprietary product. Slides can be prepared in a text editor of your choice, generated on a any platform where you find perl, and presented by any browser which can render the chosen output format.

To sum it up, PerlPoint Software takes an ASCII text and transforms it into slides written in a certain document description language. This is, by tradition, usually HTML, but you may decide to use another format like XML, SGML, TeX or whatever you want.

Well, this sounds fine, but how to build a translator which transforms ASCII into the output format of your choice? Thats what PerlPoint::Parser is made for. It performs the first translation step by parsing ASCII and transforming it into an intermediate stream format, which can be processed by a subsequently called translator backend. By separating parsing and output generation we get the flexibility to write as many backends as necessary by using the same parser frontend for all translators.

PerlPoint::Parser supports the complete GRAMMAR with exception of certain tags. Tags are supported the most common way: the parser recognizes any tag which is declared by the author of a translator. This way the parser can be used for various flavours of the PerlPoint language without having to be modified. So, if there is a need of a certain new flag, it can quickly be added without any change to PerlPoint::Parser.

The following chapters describe the input format (GRAMMAR) and the generated stream format (STREAM FORMAT). Finally, the class methods are described to show you how to build a parser.

GRAMMAR

This chapter describes how a PerlPoint ASCII slide description has to be formatted to pass PerlPoint::Parser parsers.

Please note that the input format does not completely determine how the output will be designed. The final format depends on the backend which has to be called after the parser to transform its output into a certain document description language. The final appearance depends on the browsers behaviour.

Each PerlPoint document is made of paragraphs.

The paragraphs

All paragraphs start at the beginning of their first line. The first character or string in this line determines which paragraph is recognized.

A paragraph is completed by an empty line (which may contain whitespaces). Exceptions are described.

Carriage returns in paragraphs which are completed by an empty line are transformed into a whitespace

Comments

start with "//" and reach until the end of the line.

Headlines

start with one or more "=" characters. The number of "=" characters represents the headline level.

=First level headline

==Second level headline

===Multi
  line
 headline
example

Lists

Points or unordered lists start with a "*" character.

* This is a first point.

* And, I forgot,
  there is something more to point out.

There are ordered lists as well, and they start with a hash sign ("#"):

# First, check the number of this.

# Second, don't forget the first.

The hash signs are intended to be replaced by numbers by a backend.

Because PerlPoint works on base of paragraphs, any paragraph different to an ordered list point closes an ordered list. If you wish the list to be continued use a double hash sign in case of the single one in the point that reopens the list.

# Here the ordered list begins.

? $includeMore

## This is point 2 of the list that started before.

# In subsequent points, the usual single hash sign works as
  expected again.

List continuation works list level specific (see below for level details). A list cannot be continued in another chapter. Using "##" in the first point of a new list takes no special effect: the list will begin as usual (with number 1).

Definition lists are a third list variant. Each item starts with the described phrase enclosed by a pair of colons, followed by the definition text:

:first things: are usually described first,

:others:       later then.

All lists can be nested. A new level is introduced by a special paragraph called "list indention" which starts with a ">". A list level can be terminated by a "list indention stop" paragraph containing of a "<" character. (These startup characters symbolize "level shifts".)

* First level.

* Still there.

* A list point of the 2nd level.

* Back on first level.

Level shifts are accepted between list items only.

Texts

are paragraphs like points but begin immediatly without a startup character:

This is a simple text.

In this new text paragraph,
we demonstrate the multiline feature.

Blocks

are intended to contain examples or code with tag recognition. This means that the parser will discover embedded tags. On the other hand, it means that one may have to escape ">" characters outside tags. Blocks begin with an indentation and are completed by the next empty line.

* Look at these examples:

    A block.

    I<Another> block.
    Escape "\>".

Examples completed.

Subsequent blocks are joined together automatically: the intermediate empty lines which would usually complete a block are translated into real empty lines within the block. This makes it easier to integrate real code sequences as one block, regardless of the empty lines included. However, one may explicitly wish to separate subsequent blocks and can do so by delimiting them by a special control paragraph:

* Separated subsequent blocks:

    The first block.

    -

    The second block.

Verbatim blocks

are similar to blocks in indentation but deactivate pattern recognition. That means the embedded text is not scanned for tags and empty lines and may therefore remain as it was in its original place, possibly a script.

These special blocks need a special syntax. They are realized as here documents. Start with a here document clause flagging which string will close the "here document":

<<EOC

  # compare
  $rc=3>2?4:5; # contains ">" which has not to be escaped;

EOC

Tables

are supported as well, they start with an @ sign which is followed by the column delimiter:

@|
 column 1   |   column 2   |  column 3
  aaa       |    bbb       |   ccc
  uuu       |    vvvv      |   www

There is also a more sophisticated way to describe tables, see the tag section below.

Conditions

start with a "?" character. If active contents is enabled, the paragraph text is evaluated as Perl code. The (boolean) evaluation result then determines if subsequent PerlPoint is read and parsed. If the result is false, all subsequent paragraphs until the next condition are skipped.

Note that base data is made available by a hash global (package) hash reference $PerlPoint. See run() for details about how to set these data up.

Conditions can be used to maintain various language versions of a presentation in one source file:

? $PerlPoint->{targetLanguage} eq 'German'

Or you could enable parts of your document by date:

? time>$dateOfTalk

or by a special setting:

? $PerlPoint->{userSettings}{setting}

Please note that the condition code shares its variables with embedded and included code.

To make usage easier and to improve readability, condition code is evaluated with disabled warnings (the language variable in the example above may not even been set).

Translator authors might want to provide predefined variables such as "$language" in the example.

Variable assignment paragraphs

Variables can be used in the text and will be automatically replaced by their string values (if declared).

The next paragraph sets a variable.

$var=var

This variable is called $var.

All variables are made available to embedded and included Perl code as well as to conditions and can be accessed there as package variables of "main::" (or whatever package name the Safe object is set up to). Because a variable is already replaced by the parser if possible, you have to use the fully qualified name or to guard the variables "$" prefix character to do so:

\EMBED{lang=perl}join(' ', $main::var, \$var)\END_EMBED

Variable modifications by embedded or included Perl do not affect the variables visible to the parser. (This is true for conditions as well.) This means that

$var=10
\EMBED{lang=perl}$main::var*=2;\END_EMBED

causes $var to be different on parser and code side - the parser will still use a value of 10, while embedded code works on with a value of 20.

Macro or alias definitions

Sometimes certain text parts are used more than once. It would be a relieve to have a shortcut instead of having to insert them again and again. The same is true for tag combinations a user may prefer to use. That's what aliases (or "macros") are designed for. They allow a presentation author to declare his own shortcuts and to use them like a tag. The parser will resolve such aliases, replace them by the defined replacement text and work on with this replacement.

An alias declaration starts with a "+" character followed immediately by the alias name (without backslash prefix), followed immediately by a colon. (No additional spaces here.) All text after this colon up to the paragraph closing empty line is stored as the replacement text. So, whereever you will use the new macro, the parser will replace it by this text and reparse the result. This means that your macro text can contain any valid construction like tags or other macros.

The replacement text may contain strings embedded into doubled underscores like "__this__". This is a special syntax to mark that the macro takes parameters of these names (e.g. "this"). If a tag is used and these parameters are set, their values will replace the mentioned placeholders. The special placeholder "__body__" is used to mark the place where the macro body is to place.

Here are a few examples:

+RED:\FONT{color=red}<__body__>

+F:\FONT{color=__c__}<__body__>

+IB:\B<\I<__body__>>

This \IB<text> is \RED<colored>.

+TEXT:Macros can be used to abbreviate longer
texts as well as other tags
or tag combinations.

+HTML:\EMBED{lang=html}

Tags can be \RED<\I<nested>> into macros. And \I<\F{c=blue}<vice versa>>.
\IB<\RED<This>> is formatted by nested macros.
\HTML This is <i>embedded HTML</i>\END_EMBED.

Please note: \TEXT

An empty macro text undefines the macro (if it was already known).

// undeclare the IB alias
+IB:

Please note that the current implementation is still called experimental because there may be still untested cases.

An alias can be used like a tag.

What about special formatting?

Earlier versions of pp2html supported special format hints like the HTML expression ">" for the ">" character, or "ü" for "ü". PerlPoint::Parser does not support this directly because such hints are specific to the output format - if someone wants to translate into TeX, it might be curious for him to use HTML syntax in his ASCII text. Further more, such hints can be handled completely by a backend which finds them unchanged in the produced output stream.

The same is true for special headers and trailers. It is a backend task to add them if necessary. The parser does handle the input only.

STREAM FORMAT

It is suggested to use PerlPoint::Backend to evaluate the intermediate format. Nevertheless, here is the documentation of this format.

The generated stream is an array of tokens. Most of them are very simple representing just their contents - words, spaces and so on. Example:

"These three words."

would be streamed into

"These" + " " + "three" + " "+ "words."

Note that the final dot is part of the last token. From a document description view, this should make no difference, its just a string containing special characters or not.

Well, besides this "main stream", there are formatting directives. They flag the beginning or completion of a certain format - this means a whole document, a paragraph or a real formatting like italicising. Every format is embedded into a start and a completion directive - except of simple tokens.

In the current implementation, a directive is a reference to an array of mostly two fields: a directive constant showing which format is related, and a start or completion hint, which is a constant, too. The used constants are declared in PerlPoint::Constants. Directives can pass additional informations in additional fields. By now, the headline directives use this feature to show the headline level, as well as the tag ones to provide tag type information and the document ones to keep the name of the original document. Further more, ordered list point can request a fix number this way.

# this example shows a tag directive
... [DIRECTIVE_TAG, DIRECTIVE_START, "I"]
+ "formatted" + " " + "strings"
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, "I"] ...

To recognize whether a token is a basic or a directive, the ref() function can be used. However, this handling should be done by PerlPoint::Backend transparently and is documented here for information purposes only.

Original line numbers are no part of the stream.

This is the complete generator format. It is designed to be simple but powerful.

METHODS

new()

The constructor builds and prepares a new parser object.

Parameters:

The class name.

Return value: The new object in case of success.

Example:

my ($parser)=new PerlPoint::Parser;

run() This function starts the parser for a number of passed files.

Parameters: All parameters except of the object parameter are named (pass them by hash).

activeBaseData

This optional parameter allows to pass common data to all active contents (conditions, embedded and included Perl) by a hash reference. By convention, a translator at least passes the target language and user settings by

activeBaseData => {
                   targetLanguage => "lang",
                   userSettings   => \%userSettings,
                  },

User settings are intended to allow the specification of per call settings by a user, e.g. to include special parts. By using this convention, users can easily specify such a part the following way

? $PerlPoint->{userSettings}{setting}

Special part.

? 1

It is up to a translator author to declare translator specific settings (and to document them). The passed values can be as complex as necessary as long as it can be duplicated by Storable::dclone().

Whenever active contents is invoked, the passed hash reference is copied (duplicated by Storable::dclone()) into the Safe objects namespace (see safe) as a global variable $PerlPoint. This way, modifications by invoked code do not effect subsequently called code snippets, base data are always fresh.

activeDataInit

Reserved to pass hook functions to be called preparing every active contents invokation. The hook is still unimplemented.

cache

This optional parameter controls source file paragraph caching.

By default, a source file is parsed completely everytime you pass it to the parser. This is no problem with tiny sources but can delay your work if you are dealing with large sources which have to be translated periodically into presentations while they are written. Typically most of the paragraphs remain unchanged from version to version, but nevertheless everything is usually reparsed which means a waste of time. Well, to improve this a paragraph cache can be activated by setting this option to CACHE_ON.

The parser caches each initial source file individually. That means that if three files are passed to the parser with activated caching, three cache files will be written. They are placed in the source file directory, named .<source file>.ppcache. Please note that the paragraphs of included sources are cached in the cache file of the main document because they may have to be evaluated differently depending on inclusion context.

What acceleration can be expected? Well, this strongly depends on your source structure. Efficiency will grow with longer paragraphs, reused paragraphs and paragraph number. It will be reduced by heavy usage of active contents, macros and embedding because every paragraph that refers to parts defined externally is not strongly determined by itself and therefore it cannot be cached. Here is a list of all reasons which cause a paragraph to be excluded from caching:

Usage of variables: Every occurence of anything looking like a replacable variable ($var or ${var}). Even if this variable has no assigned value at caching time, this could have been changed when the paragraph will be reread later.
Usage of macros: A macro means "unreproducable contents" because its definition is (potentially) subject to changes.
Embedded parts: Obviously dynamic parts may change from one version to another, but even static parts could have to be interpreted differently because a user can set up new filters.
Included files: An \INCLUDE tag immediately disables caching for the paragraph it resides in because the loaded file may change its contents. This is not really a restriction because the included paragraphs themselves are cached if possible.

Even with all these restrictions about 50% of a real life document of more than 150 paragraphs (with a large number of used macros) could be cached. This saved 20% of the runtime in subsequent parser calls (with the pp2html translator by Lorenz Domke).

New cache entries are always added which means that old entries are never replaced and a cache file tends to grow. If you ever wish to clean up a cache file completely pass CACHE_CLEANUP to this option.

To avoid trouble CACHE_CLEANUP is strongly recommended after adding new alias definitions (that means, alias names which were unknown before), unless the special control tag \ACCEPT_ALL is declared (see tags for details). Otherwise, unchanged paragraphs containing the backslashed string which is an alias now would be restored from cache so that the new alias would take no effect.

// Text before: \NEW has no meaning and is evaluated
// as a simple word "NEW" - the backslash is removed silently.
// This paragraph will be cached because it does not contain an
// alias/macro.
There could be \NEW aliases someday.

// Now consider this is the next turn and \NEW was declared
// an alias now. But the paragraph using new remained unchanged,
// so I<it is restored from cache>.

This seems to be a rather seldom case but users should be informed about this behaviour if you provide the cache feature to them.

To deactivate caching explicitly pass CACHE_OFF. An existing cache will not be destroyed.

Settings can be combined by addition.

# clean up the cache, then refill it
cache => CACHE_CLEANUP+CACHE_ON,

# clean up the cache and deactivate it
cache => CACHE_CLEANUP+CACHE_OFF,

The CACHE_OFF value is overwritten by any other setting.

It is suggested to make this setting available to translator users to let them decide when a cache should be used.

Please note that there is a problem with line numbers if paragraphs are restored from cache because of the behaviour of perls paragraph mode. In this mode, the <> operator reads in any number of newlines between paragraphs but supplies only one of them. That is why I do not get the real number of lines in a paragraph and therefore cannot store them. To work around this, two strategies can be used. First, do not use more than exactly one newline between paragraphs. (This strategy is not for real life users, of course, but in this case restored numbers would be correct.) Second, remember that source line numbers are only interesting in error messages. If the parser detects an error, it therefore says: error "there or later" when a cache hit already occured. If the real number is wished the parser could be reinvoked then with deactivated cache and will report it.

Second note: cache files are not locked while using them. If you need this feature please let me know.

display

This parameter is optional. It controls the display of runtime messages like informations or warnings. By default, all messages are displayed. You can suppress these informations partially or completely by passing one or more of the "DISPLAY_..." variables declared in PerlPoint::Constants. Constants should be combined by addition.

files

a reference to an array of files to be scanned.

filter

a regular expression describing the target language. This setting, if used, prevents all embedded or included source code of other languages than the set one from inclusion into the generated stream. This accelerates both parsing and backend handling. The pattern is evaluated case insensitively.

Example: pass "html|perl" to allow HTML and Perl.

To illustrate this, imagine a translator to PostScript. If it reads a Perl Point file which includes native HTML, this translator cannot handle such code. The backend would have to skip the HTML statements. With a "PostScript" filter, the HTML code will not appear in the stream.

This enables PerlPoint texts prepared for various target languages. If an author really needs plain target language code to be embedded into PerlPoint, he could provide versions for various languages. Translators using a filter will then receive exactly the code of their target language, if provided.

Please note that you cannot filter out PerlPoint code or files.

By default, no filter is set.

linehints

If set to a true value, the parser will embed line hints into the stream whenever a new source line begins.

A line hint has the form

[DIRECTIVE_NEW_LINE, DIRECTIVE_START, {file=>filename, line=>number}]

and is suggested to be handled by a backend callback.

Please note that currently source line numbers are not guaranteed to be correct if stream parts are restored from cache (see there for details).

The default value is 0.

object

the parser object made by new();

safe

an object of the Safe class which comes with perl. It is used to evaluate embedded Perl code in a safe environment. By letting the caller of run() provide this object, a translator author can make the level of safety fully configurable by users. Usually, the following should work

use Safe;
...
$parser->run(safe=>new Safe, ...);

Active Perl contents is suppressed if this setting is omitted or if anything else than a Safe object is passed. (There are currently three types of active contents: embedded or included Perl and condition paragraphs.)

stream

A reference to an array where the generated output stream should be stored in.

Application programmers may want to tie this array if the target ASCII texts are expected to be large (long ASCII texts can result in large stream data which may occupy a lot of memory). Because of the fact that the parser stores stream data by paragraph, memory consumption can be reduced significantly by tying the stream array.

It is recommended to pass an empty array. Stored data will not be overwritten, the parser appends its data instead (by push()).

tags

A reference to a hash which keys are the tags that should be recognized. For example, pass "I", "B" and "C" to implement the well known POD tags.

Take care to pass capitalized tag keys only. Non capitalized keys cannot be recognized by convention.

If a tag is discovered, the parser will produce an open and a close directive in the output stream containing the tags name as stored in the hash. Look at this example:

# you pass
tags => {I=>1, B=>1, C=>1},

# the tags are recognized in a text like
"... \I<bla \B<blu> \C<> blo>..."

# for which the parser will produce something like
... [DIRECTIVE_TAG, DIRECTIVE_START, 'I']
+ "bla" + " "
+ [DIRECTIVE_TAG, DIRECTIVE_START, 'B']
+ "blu"
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, 'B']
+ " "
+ [DIRECTIVE_TAG, DIRECTIVE_START, 'C']
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, 'C']
+ " " + "blo"
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, 'I']

It is suggested to handle tags by backend callbacks.

The parser takes no attention to the hash values.

Note that any tag that is not declared in this hash cannot be discovered by the parser and will not be passed to the stream - they are recognized as simple strings (without the leading backslash).

As a new experimental feature, this default behaviour can be modified. If a tag name "\ACCEPT_ALL" is passed, anything that looks like a tag will be recognized as a tag. (Take care to guard all backslashes which shall not start a tag or macro!) This feature is built in to simplify processing by different backends which may implement different tag sets.

trace

This parameter is optional. It is intended to activate trace code while the method runs. You may pass any of the "TRACE_..." constants declared in PerlPoint::Constants, combined by addition as in the following example:

# show the traces of both
# lexical and syntactical analysis
trace => TRACE_LEXER+TRACE_PARSER,

If you omit this parameter or pass TRACE_NOTHING, no traces will be displayed.

var2stream

If set to a true value, the parser will propagate variable settings into the stream by adding additional DIRECTIVE_VARSET directives.

A variable propagation has the form

[DIRECTIVE_VARSET, DIRECTIVE_START, {var=>varname, value=>value}]

and is suggested to be handled by a backend callback.

The default value is 0.

vispro

activates "process visualization" which simply means that a user will see progress messages while the parser reads the documents. The numerical value of this setting determines how often the progress message shall be updated by a paragraph interval:

# inform every five paragraphs
vispro => 5,

Process visualization is automatically suppressed unless STDERR is connected to a terminal, if this option is omitted, display was set to DISPLAY_NOINFO or parser traces are activated.

Return value: A "true" value in case of success, "false" otherwise. A call is performed successfully if there was neither a syntactical nor a semantic error in the parsed files.

Example:

$parser->run(
             stream  => \@streamData,
             tags    => \%tagHash,
             files   => \@ARGV,
             filter  => 'HTML',
             cache   => CACHE_ON,
             trace   => TRACE_PARAGRAPHS,
            );

EXAMPLE

The following code shows a minimal but complete parser.

# pragmata
use strict;

# load modules
use PerlPoint::Parser;

# declare variables
my (@streamData, %tagHash);

# declare list of tag openers
@tagHash{qw(B C I IMG E)}=();

# build parser
my ($parser)=new PerlPoint::Parser;
# and call it
$parser->run(
             stream  => \@streamData,
             tags    => \%tagHash,
             files   => \@ARGV,
            );

NOTES

Format

The PerlPoint format was initially designed by Tom Christiansen, who wrote an HTML slide generator for it, too.

Lorenz Domke added a number of additional useful and interesting features to the original implementation. At a certain point, we decided to redesign the tool to make it a base for slide generation not only into HTML but into various document description languages.

The PerlPoint format implemented by this parser version is slightly different from the original design. Presentations written for Perl Point 1.0 will not pass the parser but can simply be converted into the new format. We designed the new format as a team of Lorenz Domke, Stephen Riehm and me.

Storable updates

From version 0.24 on the Storable module is a prerequisite of the parser package because Storable is used to store and retrieve cache data in files. If you update your Storable installation it might happen that its internal format changes and therefore stored cache data becomes unreadable. Simply remove the cache files (see FILES) and rerun the parser to rebuild the cache, or call the parsers run() method with cache set to CACHE_CLEAN for the same effect.

FILES

If caches are used, the parser writes cache files where the initial sources are stored. They are named .<source file>.ppcache.

AUTHOR

This module is free software, you can redistribute it and/or modify it under the terms of the Artistic License distributed with Perl version 5.003 or (at your option) any later version. Please refer to the Artistic License that came with your Perl distribution for more details.

The Artistic License should have been included in your distribution of Perl. It resides in the file named "Artistic" at the top-level of the Perl source tree (where Perl was downloaded/unpacked - ask your system administrator if you dont know where this is). Alternatively, the current version of the Artistic License distributed with Perl can be viewed on-line on the World-Wide Web (WWW) from the following URL: http://www.perl.com/perl/misc/Artistic.html.

PerlPoint::Parser is built using Parse::Yapp a way that users have not to explicitly install Parse::Yapp themselves. According to the copyright note of Parse::Yapp I have to mention the following:

You may use and distribute them under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file."

DISCLAIMER

This software is distributed in the hope that it will be useful, but is provided "AS IS" WITHOUT WARRANTY OF ANY KIND, either expressed or implied, INCLUDING, without limitation, the implied warranties of MERCHANTABILITY and FITNESS FOR A PARTICULAR PURPOSE.

The ENTIRE RISK as to the quality and performance of the software IS WITH YOU (the holder of the software). Should the software prove defective, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY CREATE, MODIFY, OR DISTRIBUTE THE SOFTWARE BE LIABLE OR RESPONSIBLE TO YOU OR TO ANY OTHER ENTITY FOR ANY KIND OF DAMAGES (no matter how awful - not even if they arise from known or unknown flaws in the software).

Please refer to the Artistic License that came with your Perl distribution for more details.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 673:: Non-ASCII character seen before =encoding in '"ü".'. Assuming CP1252

To install PerlPoint::Parser, copy and paste the appropriate command in to your terminal.

cpanm

cpanm PerlPoint::Parser

CPAN shell

perl -MCPAN -e shell
install PerlPoint::Parser

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)