NAME

GraphViz2::Marpa - A Marpa-based parser for Graphviz dot files

Synopsis

o Display help
perl scripts/g2m.pl -h
o Run the parser
perl scripts/g2m.pl -input_file data/16.gv
perl scripts/g2m.pl -input_file data/16.gv -max info

The "FAQ" discusses the way the parsed data is stored in RAM.

o Run the parser and the default renderer
perl scripts/g2m.pl -input_file data/16.gv -output_file ./16.gv

./16.gv will be the rendered Graphviz dot file.

See scripts/test.utf8.sh for comparing the output of running the parser, and dot, on all data/utf8.*.gv files.

See also "Scripts".

Description

GraphViz2::Marpa provides a Marpa::R2-based parser for Graphviz graph definitions.

Demo output: http://savage.net.au/Perl-modules/html/graphviz2.marpa/index.html.

Marpa's homepage.

Articles:

o Overview

Announcing this module

o Building the Grammar

Conditional preservation of whitespace

This module will be re-written, again, now that its BNF has been incorporated into GraphViz2::Marpa, and patched along the way.

Modules

o GraphViz2::Marpa

The current module, which documents the set of modules.

It can, optionally, use the default renderer GraphViz2::Marpa::Renderer::Graphviz.

Accepts a Graphviz graph definition and builds a corresponding data structure representing the parsed graph. It can pass that data to the default renderer, GraphViz2::Marpa::Renderer::Graphviz, which can then render it to a text file ready to be input to dot. Such 'round-tripping', as it's called, is the best way to test a renderer.

See scripts/g2m.pl and scripts/test.utf8.sh.

o GraphViz2::Marpa::Renderer::Graphviz

The default renderer. Optionally called by the parser.

o GraphViz2::Marpa::Config

Auxiliary code, used to help generate the demo page.

o GraphViz2::Marpa::Utils

Auxiliary code, used to help generate the demo page.

Sample Data

o Input files: data/*.gv

These are valid Graphviz graph definition files.

Some data/*.gv files may contain deliberate mistakes, which may or may not stop production of output files. They may cause various warning messages to be printed by dot when being rendered.

See the demo page for details.

o Output files: html/*.svg

The html/*.svg are Graphviz graph definition files output by scripts/generate.demo.sh.

The round trip shows that the lex/parse process does not lose information along the way, but comments are discarded..

This set, and the set xt/author/html/*.svg just below, are generated by running scripts/generate.demo.sh. This in turn runs both scripts/generate.svg.sh and scripts/generate.demo.pl.

o Input files: xt/author/data/*.gv

As for data/*.gv above, but these files are copied from Graphviz V 2.38.0, and are often quite complex.

See find.candidates.pl, below.

o Output files: xt/author/html/*.svg

As for html/*.svg above.

Scripts

These are in the scripts/ directory.

o copy.config.pl

For use by the author. Output:

Copied config/.htgraphviz2.marpa.conf to /home/ron/.config/Perl/GraphViz2-Marpa
o find.candidates.pl

For use by the author.

This scans an unpacked distro of Graphviz V 2.38.0 and finds *.gv matching these criteria:

o In ~/Downloads/Graphviz/graphviz-2.38.0/
o Not too big

I.e. the file must be < 10,000 bytes in size, otherwise it may take too long to process.

o Not a fake

Currently, only ~/Downloads/Graphviz/graphviz-2.38.0/tclpkg/gv/META.gv fits this definition.

o Not already present in xt/author/data

Any candidates found have their names printed, for easy one-at-a-time copying from Graphviz and testing via scripts/test.1.sh.

o find.config.pl

For use by the author. Output:

Using: File::HomeDir -> my_dist_config('GraphViz2-Marpa', '.htgraphviz2.marpa.conf'):
Found: /home/ron/.config/Perl/GraphViz2-Marpa/.htgraphviz2.marpa.conf
o g2m.pl

Runs the parser. Try running with -h.

o g2m.sh

Simplifies running g2m.pl.

o generate.demo.pl

See generate.demo.sh.

o generate.demo.sh

For use by the author. Actions:

o Runs dot on all data/*.gv files; outputs to html/*.svg
o Runs scripts/generate.demo.pl; outputs to html/index.html
o Copies html/* to various places
o generate.svg.sh

Convert all data/*.svg into html/*.svg.

Used by generate.demo.sh.

o gv2svg.sh

Converts one data/*.gv file into $DR/Perl-modules/html/graphviz2.marpa/*.svg.

o pod2html.sh

Converts all *.pm files to *.html, and copies them in my web server's dir structure (in Debian's RAM disk).

o test.1.sh

Runs both the parser and dot so I can compare the output.

o test.html.pl

Uses method perform_1_test() in GraphViz2::Marpa::Utils, to test the stand-alone BNF used for HTML-like tables.

Note: t/test.t also calls perform_1_test().

o test.utf8.sh

Tests one data/utf8*.gv file more thoroughly than test.1.sh does.

Distributions

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.

Installation

Install GraphViz2::Marpa as you would for any Perl module:

Run:

cpanm GraphViz2::Marpa

or run:

sudo cpan GraphViz2::Marpa

or unpack the distro, and then either:

perl Build.PL
./Build
./Build test
sudo ./Build install

or:

perl Makefile.PL
make (or dmake or nmake)
make test
make install

Constructor and Initialization

new() is called as my($g2m) = GraphViz2::Marpa -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type GraphViz2::Marpa.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. description([$graph])]):

o description => $graphDescription

Read the Graphviz graph definition from the command line.

You are strongly encouraged to surround this string with '...' to protect it from your shell.

See also the 'input_file' option to read the description from a file.

The 'description' option takes precedence over the 'input_file' option.

Default: ''.

o input_file => $aDotInputFileName

Read the Graphviz graph definition from a file.

See also the 'description' option to read the graph definition from the command line.

The 'description' option takes precedence over the 'input_file' option.

Default: ''.

See the distro for data/*.gv.

o logger => $aLoggerObject

Specify a logger compatible with Log::Handler, for the lexer and parser to use.

Default: A logger of type Log::Handler which writes to the screen.

To disable logging, just set 'logger' to the empty string (not undef).

o maxlevel => $logOption1

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'notice'.

o minlevel => $logOption2

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'error'.

No lower levels are used.

o output_file => aRenderedDotInputFileName

Specify the name of a file for the renderer to write.

That is, write the DOT-style graph definition to a file.

When this file and the input file are both run thru dot, they should produce identical *.svg files.

If an output file name is specified, an object of type GraphViz2::Marpa::Renderer::Graphviz is created and called after the input file has been successfully parsed.

Default: ''.

The default means the renderer is not called.

o renderer => aGraphViz2::Marpa::Renderer::Graphviz-compatible object

Specify a renderer for the parser to use.

See output_file just above.

Default: undef.

If an output file is specified, then an object of type GraphViz2::Marpa::Renderer::Graphviz is created and its run() method is called.

o trace_terminals => $Boolean

This allows g2m.pl to control the trace_terminals setting passed to Marpa::R2::Scanless::R.

Methods

description([$graph])

The [] indicate an optional parameter.

Get or set the Graphviz graph definition string.

The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.

See also "input_file()".

'description' is a parameter to "new()". See "Constructor and Initialization" for details.

input_file([$graph_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read the Graphviz graph definition from.

The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.

See also the "description()" method.

'input_file' is a parameter to "new()". See "Constructor and Initialization" for details.

log($level, $s)

If a logger is defined, this logs the message $s at level $level.

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".

This logger is passed to other modules.

'logger' is a parameter to "new()". See "Constructor and Initialization" for details.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

new()

See "Constructor and Initialization" for details on the parameters accepted by "new()".

output_file([$file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file for the renderer to write.

If an output file name is specified, an object of type GraphViz2::Marpa::Renderer::Graphviz is created and called after the input file has been successfully parsed.

'output_file' is a parameter to "new()". See "Constructor and Initialization" for details.

renderer([$renderer_object])

Here, the [] indicate an optional parameter.

Get or set the renderer object.

This renderer is called if output_file() is given a value.

'renderer' is a parameter to "new()". See "Constructor and Initialization" for details.

run()

This is the only method the caller needs to call. All parameters are supplied to "new()" (or via other methods before run() is called).

See scripts/g2m.pl.

Returns 0 for success and 1 for failure.

trace_terminals([$Boolean])

Here, the [] indicate an optional parameter.

Get or set the trace_terminals option passed to Marpa::R2::Scanless::R.

FAQ

How is the parsed data held in RAM?

The parsed output is held in a tree managed by Tree::DAG_Node.

Here and below, the word node (usually) refers to nodes in this tree, not Graphviz-style nodes.

The root node always looks like this when printed by Tree::DAG_Node's tree2string() method:

root. Attributes: {type => "root_literal", uid => "0", value => "root"}

Interpretation:

o The node name

Here, root.

o The node's attributes

Key fields:

o type

Here, root_literal.

The type (or name) of the value. The word 'name' is not used to avoid confusion with the name of the node.

o uid

A unique integer assigned to each node. Counts up from 0. Not used.

o value

The value of the node.

Here, root.

Can you explain this tree in more detail?

Sure. Firstly, we examine a sample graph, assuming the module's pre-reqs are installed. Let's use data/10.gv. Here it is as an svg.

Run one of these:

scripts/g2m.sh data/10.gv -max info
perl -Ilib scripts/g2m.pl -input_file data/10.gv -max info

The former echos the input file to STDOUT before running the latter.

Using -max notice, which is the default, produces no output from g2m.pl.

This is the input:

STRICT digraph graph_10
{
	edge ["color" = "green"];
	node [shape=rpromoter]
	terminator [label = "\nterminator" shape = terminator;];

	rpromoter -> terminator [label = Transformer]
}

And this is the output:

root. Attributes: {type => "root_literal", uid => "0", value => "root"}
   |--- prolog. Attributes: {type => "prolog_literal", uid => "1", value => "prolog"}
   |   |--- literal. Attributes: {type => "strict_literal", uid => "3", value => "strict"}
   |   |--- literal. Attributes: {type => "digraph_literal", uid => "4", value => "digraph"}
   |--- graph. Attributes: {type => "graph_literal", uid => "2", value => "graph"}
       |--- node_id. Attributes: {type => "node_id", uid => "5", value => "graph_10"}
       |--- literal. Attributes: {type => "open_brace", uid => "6", value => "{"}
       |   |--- class. Attributes: {type => "class", uid => "7", value => "edge"}
       |   |   |--- literal. Attributes: {type => "open_bracket", uid => "8", value => "["}
       |   |   |--- attribute. Attributes: {type => "color", uid => "9", value => "green"}
       |   |   |--- literal. Attributes: {type => "close_bracket", uid => "10", value => "]"}
       |   |--- class. Attributes: {type => "class", uid => "11", value => "node"}
       |   |   |--- literal. Attributes: {type => "open_bracket", uid => "12", value => "["}
       |   |   |--- attribute. Attributes: {type => "shape", uid => "13", value => "rpromoter"}
       |   |   |--- literal. Attributes: {type => "close_bracket", uid => "14", value => "]"}
       |   |--- node_id. Attributes: {type => "node_id", uid => "15", value => "terminator"}
       |   |   |--- literal. Attributes: {type => "open_bracket", uid => "16", value => "["}
       |   |   |--- attribute. Attributes: {type => "label", uid => "17", value => "\nterminator"}
       |   |   |--- attribute. Attributes: {type => "shape", uid => "18", value => "terminator"}
       |   |   |--- literal. Attributes: {type => "close_bracket", uid => "19", value => "]"}
       |   |--- node_id. Attributes: {type => "node_id", uid => "20", value => "rpromoter"}
       |   |--- edge_id. Attributes: {name => "directed_edge", uid => "21", value => "->"}
       |   |--- node_id. Attributes: {type => "node_id", uid => "22", value => "terminator"}
       |       |--- literal. Attributes: {type => "open_bracket", uid => "23", value => "["}
       |       |--- attribute. Attributes: {type => "label", uid => "24", value => "Transformer"}
       |       |--- literal. Attributes: {type => "close_bracket", uid => "25", value => "]"}
       |--- literal. Attributes: {type => "close_brace", uid => "26", value => "}"}
Parse result:  0 (0 is success)

You can see from this output that words special to Graphviz (e.g. STRICT) are accepted no matter what case they are in. Such tokens are stored in lower-case.

A more detailed analysis follows.

The root node has 2 daughters:

o The prolog sub-tree

The prolog node is the root of a sub-tree holding everything before the graph's ID, if any.

The node is called prolog, and its hashref of attributes is {type => "prolog_literal", uid => "1", value => "prolog"}.

It has 1 or 2 daughters. The possibilities are:

o Input: 'digraph ...'

The 1 daughter is named literal, and its attributes are {type => "digraph_literal", uid => "3", value => "digraph"}.

o Input: 'graph ...'

The 1 daughter is named literal, and its attributes are {type => "graph_literal", uid => "3", value => "graph"}.

o Input: 'strict digraph ...'

The 2 daughters are named literal, and their attributes are, respectively, {type => "strict_literal", uid => "3", value => "strict"} and {type => "digraph_literal", uid => "4", value => "digraph"}.

o Input: 'strict graph ...'

The 2 daughters are named literal, and their attributes are, respectively, {type => "strict_literal", uid => "3", value => "strict"'} and {type => "graph_literal", uid => "4", value => "graph"}.

And yes, the graph ID, if any, is under the graph node. The reason for this is that for every subgraph within the graph, the same structure applies: First the (sub)graph ID, then a literal '{', then that (sub)graph's details, and finally a literal '}'.

o The 'graph' sub-tree

The graph node is the root of a sub-tree holding everything about the graph, including the graph's ID, if any.

The node is called graph, and its hashref of attributes is {type => "graph_literal", uid => "2", value => "graph"}.

The graph node has as many daughters, with their own daughters, as is necessary to hold the output of parsing the remainder of the input.

In particular, if the input graph has an ID, i.e. the input is of the form 'digraph my_id ...' (or various versions thereof) then the 1st daughter will be called node_id, and its attributes will be {type => "node_id", uid => "5", value => "my_id"}.

Futher, the 2nd daughter will be called literal, and its attributes will be {ype => "open_brace", uid => "6", value => "{"}. A subsequent daughter will eventually (for a syntax-free input file, of course) also be called literal, and its attributes will be {type => "close_brace", uid => "#", value => "}"}.

Naturally, if the graph has no ID (i.e. input lacks the 'my_id' token) then the uids will differ slightly.

As mentioned, this pattern of optional (sub)graph id followed by a matching pair of '{', '}' nodes, is used for all graphs and subgraphs.

In the case the input contains an explicit subgraph, then just before the node representing 'my_id' or '{', there will be another node representing the subgraph token.

It's name will be literal, and its attributes will be {type => "subgraph_literal", uid => "#", value => "subgraph"}.

How many different names can these nodes have?

The list of possible node names follows. You should always examine the type and value keys of the node's attributes to determine the exact nature of the node.

o attribute

In this case, the node's attributes contain a hashref like {type => "arrowhead", uid => "33", value => "odiamond"}, meaning the type field holds the type (i.e. name) of the attribute, and the 'value' field holds the value of the attribute.

o class

This is used when any of edge, graph, or node appear at the start of the (sub)graph, and is the mother of the attributes attached to the class. The value of the attribute will be edge, graph, or node.

The 1st and last daughters will be literals whose attribute values are '[' and ']' respectively, and the middle daughter(s) will be nodes of type attribute (as just discussed).

o edge_id

The value of the attribute will be either '--' or '->'.

Thus the tail of the edge will be the previous daughter (node or subgraph), and the head of the edge will be the next.

Samples are:

n1 -> n2
n1 -> {n2}
{n1} -> n2

In a daisy chain of nodes, the last node in the chain may have daughters that are the attributes of each edge in the chain. This is how Graphviz syntax attaches edge attributes to a path. The class edge can also be used to provide attributes for the edge.

o graph

There is only ever 1 node called graph.

o literal

literal is the name of some nodes, with the value key in the attributes having one of these values:

o {

Indicates the start of a (sub)graph.

o }

Indicates the end of a (sub)graph.

o [

This indicates the start of a set of attributes for a specific class, edge or node, or the edge attributes at the end of a path.

The 1st and last daughters will be literals whose attribute value keys are '[' and ']' respectively.

Between these 2 nodes will be 1 node for each attribute, as seen above with edge ["color" = "green",].

Note: Graphviz allows an abbreviated syntax for setting the attributes of a (sub)graph. So, instead of needing:

graph [rankdir = LR]

You can just use:

rankdir = LR

In such cases, these attributes are not surrounded by '[' and ']'.

o ]

See the previous point.

o digraph_literal
o graph_literal
o strict_literal
o subgraph_literal
o node_id

The value of the attributes is the name of the graph, a node, or a subgraph.

Note: A node name can appear more than once in succession, either as a declaration of the node's existence and then as the tail of an edge, or, as in this fragment of data/56.gv:

node [shape=rpromoter colorscheme=rdbu5 color=1 style=filled fontcolor=3]; Hef1a; TRE; UAS;
Hef1aLacOid; Hef1aLacOid [label="Hef1a-LacOid"];

This is a case where tree compression could be done, but isn't done yet.

o prolog

There is only ever 1 node called prolog.

o root

There is only ever 1 node called root.

How are nodes, ports and compass points represented in the (above) tree?

Input contains this fragment of data/16.gv:

node_16_1:p11 -> node_16_2:p22:s
[
	arrowhead = "odiamond";
	arrowtail = "odot",
	color     = red
	dir       = both;
];

The output log contains:

|   |--- node_id. Attributes: {type => "node_id", uid => "29", value => "node_16_1:p11"}
|   |--- edge_id. Attributes: {name => "directed_edge", uid => "30", value => "->"}
|   |--- node_id. Attributes: {type => "node_id", uid => "31", value => "node_16_2:p22:s"}

You can see the ports and compass points have been incorporated into the value attribute.

How are HTML-like labels handled

The main grammar (See $self -> bnf in the source) is used to hold the definitions of strings (See strict_literal). Thus Marpa, via the main parser $self -> recce, is used to identify all types of strings.

Then, if the string starts with '>', _process_html() is called, and has a separate grammar (See bnf4html). This in turn uses a separate grammar object (grammar4html) and a separate parser (recce4html). _process_html() traps any apparent parsing errors, found when lexemes (text) follows the HTML, and saves the label's value. This method also sets $pos to the first char after the HTML, so when control returns to the main parser, and the main grammar, the main parser is not aware of the existence of the HTML, and just keeps on parsing from where the HTML parser finished.

How are comments stored in the tree?

They aren't stored, they are discarded. And this in turn means rendered dot files can't ever contain them.

What is the homepage of Marpa?

http://savage.net.au/Marpa.html.

That page has a long list of links.

Why do I get error messages like the following?

Error: <stdin>:1: syntax error near line 1
context: digraph >>>  Graph <<<  {

Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph.

So, don't do this:

strict graph graph{...}
strict graph Graph{...}
strict graph strict{...}
etc...

Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:

strict graph "graph"{...}

Even better, use a more meaningful name for your graph...

The keywords are: node, edge, graph, digraph, subgraph and strict. Compass points are not keywords.

See keywords in the discussion of the syntax of DOT for details.

Does this package support Unicode in the input dot file?

Yes.

But you are strongly encouraged to put node names using utf8 glyphs in double-quotes, even though it is not always necessary.

See xt/author/data/utf8.*.gv and scripts/test.utf8.sh. In particular, see xt/author/data/utf8.01.gv.

How can I switch from Marpa::XS to Marpa::PP?

Don't use either of them. Use Marpa::R2.

If I input x.old.gv and output x.new.gv, should these 2 files be identical?

Yes - at least in the sense that running dot on them will produce the same output files. This is assuming the default renderer is used.

See scripts/test.utf8.pl for how to do just that.

As mentioned just above, comments in input files are discarded, so they can never be in the output file.

How are custom graph attributes handled?

They are treated like any other attribute. That is, syntax checking is not performed at that level, but only at the grammatical level. If the construct matches the grammar, this code accepts it.

See data/32.gv.

How are the demo files generated?

See scripts/generate.demo.sh.

How do I run author tests?

This runs both standard and author tests:

shell> perl Build.PL; ./Build; ./Build test; ./Build authortest

There are currently (V 2.00) 91 standard tests, and in xt/author/*.t, 4 pod tests and 355 author tests. Combined, they take almost 2m 30s to run.

See Also

Marpa::Demo::StringParser. The significance of this module is that during the re-write of GraphViz2::Marpa, the string-handling code was perfected in Marpa::Demo::StringParser.

Later, that code was improved within this module, and will be back-ported into Marpa::Demo::StringParser.

Machine-Readable Change Log

The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Thanks

Many thanks are due to the people who worked on Graphviz.

Jeffrey Kegler wrote Marpa::XS, and has a blog on it at http://blogs.perl.org/users/jeffrey_kegler/.

And thanks to rns (Ruslan Shvedov) for writing the grammar for double-quoted strings used in MarpaX::Demo::SampleScripts's scripts/quoted.strings.02.pl. I adapted it to HTML (see scripts/quoted.strings.05.pl in that module), and then incorporated the grammar into this module. For details, search for bnf4html, grammar4html and recce4html in the source of the current module.

Repository

https://github.com/ronsavage/GraphViz2-Marpa

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa.

Author

GraphViz2::Marpa was written by Ron Savage <ron@savage.net.au> in 2012.

Marpa's homepage: <http://savage.net.au/Marpa.html>.

My homepage: http://savage.net.au/.

Copyright

Australian copyright (c) 2012, Ron Savage.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License 2.0, a copy of which is available at:
http://opensource.org/licenses/alphabetical.