NAME
GraphViz2::Marpa
- A Marpa-based parser for Graphviz dot
files
Synopsis
- o Display help
-
perl scripts/g2m.pl -h
- o Run the parser
-
perl scripts/g2m.pl -input_file data/16.gv perl scripts/g2m.pl -input_file data/16.gv -max info
The "FAQ" discusses the way the parsed data is stored in RAM.
- o Run the parser and the default renderer
-
perl scripts/g2m.pl -input_file data/16.gv -output_file ./16.gv
./16.gv will be the rendered Graphviz
dot
file.See scripts/test.utf8.sh for comparing the output of running the parser, and
dot
, on all data/utf8.*.gv files.
See also "Scripts".
Description
GraphViz2::Marpa provides a Marpa::R2-based parser for Graphviz graph definitions.
Demo output: http://savage.net.au/Perl-modules/html/graphviz2.marpa/index.html.
Articles:
- o Overview
- o Building the Grammar
-
Conditional preservation of whitespace
This module will be re-written, again, now that its BNF has been incorporated into GraphViz2::Marpa, and patched along the way.
Modules
- o GraphViz2::Marpa
-
The current module, which documents the set of modules.
It can, optionally, use the default renderer GraphViz2::Marpa::Renderer::Graphviz.
Accepts a Graphviz graph definition and builds a corresponding data structure representing the parsed graph. It can pass that data to the default renderer, GraphViz2::Marpa::Renderer::Graphviz, which can then render it to a text file ready to be input to
dot
. Such 'round-tripping', as it's called, is the best way to test a renderer.See scripts/g2m.pl and scripts/test.utf8.sh.
- o GraphViz2::Marpa::Renderer::Graphviz
-
The default renderer. Optionally called by the parser.
- o GraphViz2::Marpa::Config
-
Auxiliary code, used to help generate the demo page.
- o GraphViz2::Marpa::Utils
-
Auxiliary code, used to help generate the demo page.
Sample Data
- o Input files: data/*.gv
-
These are valid Graphviz graph definition files.
Some data/*.gv files may contain deliberate mistakes, which may or may not stop production of output files. They may cause various warning messages to be printed by
dot
when being rendered.See the demo page for details.
- o Output files: html/*.svg
-
The html/*.svg are Graphviz graph definition files output by scripts/generate.demo.sh.
The round trip shows that the lex/parse process does not lose information along the way, but comments are discarded..
This set, and the set xt/author/html/*.svg just below, are generated by running scripts/generate.demo.sh. This in turn runs both scripts/generate.svg.sh and scripts/generate.demo.pl.
-
As for data/*.gv above, but these files are copied from Graphviz V 2.38.0, and are often quite complex.
See find.candidates.pl, below.
-
As for html/*.svg above.
Scripts
These are in the scripts/ directory.
- o copy.config.pl
-
For use by the author. Output:
Copied config/.htgraphviz2.marpa.conf to /home/ron/.config/Perl/GraphViz2-Marpa
- o find.candidates.pl
-
For use by the author.
This scans an unpacked distro of Graphviz V 2.38.0 and finds *.gv matching these criteria:
- o In ~/Downloads/Graphviz/graphviz-2.38.0/
- o Not too big
-
I.e. the file must be < 10,000 bytes in size, otherwise it may take too long to process.
- o Not a fake
-
Currently, only ~/Downloads/Graphviz/graphviz-2.38.0/tclpkg/gv/META.gv fits this definition.
Any candidates found have their names printed, for easy one-at-a-time copying from Graphviz and testing via scripts/test.1.sh.
- o find.config.pl
-
For use by the author. Output:
Using: File::HomeDir -> my_dist_config('GraphViz2-Marpa', '.htgraphviz2.marpa.conf'): Found: /home/ron/.config/Perl/GraphViz2-Marpa/.htgraphviz2.marpa.conf
- o g2m.pl
-
Runs the parser. Try running with -h.
- o g2m.sh
-
Simplifies running g2m.pl.
- o generate.demo.pl
-
See generate.demo.sh.
- o generate.demo.sh
-
For use by the author. Actions:
- o generate.svg.sh
-
Convert all data/*.svg into html/*.svg.
Used by generate.demo.sh.
- o gv2svg.sh
-
Converts one data/*.gv file into $DR/Perl-modules/html/graphviz2.marpa/*.svg.
- o pod2html.sh
-
Converts all *.pm files to *.html, and copies them in my web server's dir structure (in Debian's RAM disk).
- o test.1.sh
-
Runs both the parser and
dot
so I can compare the output. - o test.html.pl
-
Uses method perform_1_test() in GraphViz2::Marpa::Utils, to test the stand-alone BNF used for HTML-like tables.
Note: t/test.t also calls perform_1_test().
- o test.utf8.sh
-
Tests one data/utf8*.gv file more thoroughly than test.1.sh does.
Distributions
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.
Installation
Install GraphViz2::Marpa as you would for any Perl
module:
Run:
cpanm GraphViz2::Marpa
or run:
sudo cpan GraphViz2::Marpa
or unpack the distro, and then either:
perl Build.PL
./Build
./Build test
sudo ./Build install
or:
perl Makefile.PL
make (or dmake or nmake)
make test
make install
Constructor and Initialization
new()
is called as my($g2m) = GraphViz2::Marpa -> new(k1 => v1, k2 => v2, ...)
.
It returns a new object of type GraphViz2::Marpa
.
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. description([$graph])]):
- o description => $graphDescription
-
Read the Graphviz graph definition from the command line.
You are strongly encouraged to surround this string with '...' to protect it from your shell.
See also the 'input_file' option to read the description from a file.
The 'description' option takes precedence over the 'input_file' option.
Default: ''.
- o input_file => $aDotInputFileName
-
Read the Graphviz graph definition from a file.
See also the 'description' option to read the graph definition from the command line.
The 'description' option takes precedence over the 'input_file' option.
Default: ''.
See the distro for data/*.gv.
- o logger => $aLoggerObject
-
Specify a logger compatible with Log::Handler, for the lexer and parser to use.
Default: A logger of type Log::Handler which writes to the screen.
To disable logging, just set 'logger' to the empty string (not undef).
- o maxlevel => $logOption1
-
This option affects Log::Handler.
See the Log::Handler::Levels docs.
Default: 'notice'.
- o minlevel => $logOption2
-
This option affects Log::Handler.
See the Log::Handler::Levels docs.
Default: 'error'.
No lower levels are used.
- o output_file => aRenderedDotInputFileName
-
Specify the name of a file for the renderer to write.
That is, write the DOT-style graph definition to a file.
When this file and the input file are both run thru
dot
, they should produce identical *.svg files.If an output file name is specified, an object of type GraphViz2::Marpa::Renderer::Graphviz is created and called after the input file has been successfully parsed.
Default: ''.
The default means the renderer is not called.
- o renderer => aGraphViz2::Marpa::Renderer::Graphviz-compatible object
-
Specify a renderer for the parser to use.
See
output_file
just above.Default: undef.
If an output file is specified, then an object of type GraphViz2::Marpa::Renderer::Graphviz is created and its
run()
method is called. - o trace_terminals => $Boolean
-
This allows g2m.pl to control the
trace_terminals
setting passed to Marpa::R2::Scanless::R.
Methods
description([$graph])
The [] indicate an optional parameter.
Get or set the Graphviz graph definition string.
The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.
See also "input_file()".
'description' is a parameter to "new()". See "Constructor and Initialization" for details.
input_file([$graph_file_name])
Here, the [] indicate an optional parameter.
Get or set the name of the file to read the Graphviz graph definition from.
The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.
See also the "description()" method.
'input_file' is a parameter to "new()". See "Constructor and Initialization" for details.
log($level, $s)
If a logger is defined, this logs the message $s at level $level.
logger([$logger_object])
Here, the [] indicate an optional parameter.
Get or set the logger object.
To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".
This logger is passed to other modules.
'logger' is a parameter to "new()". See "Constructor and Initialization" for details.
maxlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.
'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
minlevel([$string])
Here, the [] indicate an optional parameter.
Get or set the value used by the logger object.
This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.
'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
new()
See "Constructor and Initialization" for details on the parameters accepted by "new()".
output_file([$file_name])
Here, the [] indicate an optional parameter.
Get or set the name of the file for the renderer to write.
If an output file name is specified, an object of type GraphViz2::Marpa::Renderer::Graphviz is created and called after the input file has been successfully parsed.
'output_file' is a parameter to "new()". See "Constructor and Initialization" for details.
renderer([$renderer_object])
Here, the [] indicate an optional parameter.
Get or set the renderer object.
This renderer is called if output_file()
is given a value.
'renderer' is a parameter to "new()". See "Constructor and Initialization" for details.
run()
This is the only method the caller needs to call. All parameters are supplied to "new()" (or via other methods before run()
is called).
See scripts/g2m.pl.
Returns 0 for success and 1 for failure.
trace_terminals([$Boolean])
Here, the [] indicate an optional parameter.
Get or set the trace_terminals
option passed to Marpa::R2::Scanless::R.
FAQ
How is the parsed data held in RAM?
The parsed output is held in a tree managed by Tree::DAG_Node.
Here and below, the word node
(usually) refers to nodes in this tree, not Graphviz-style nodes.
The root node always looks like this when printed by Tree::DAG_Node's tree2string() method:
root. Attributes: {type => "root_literal", uid => "0", value => "root"}
Interpretation:
Can you explain this tree in more detail?
Sure. Firstly, we examine a sample graph, assuming the module's pre-reqs are installed. Let's use data/10.gv. Here it is as an svg.
Run one of these:
scripts/g2m.sh data/10.gv -max info
perl -Ilib scripts/g2m.pl -input_file data/10.gv -max info
The former echos the input file to STDOUT before running the latter.
Using -max notice
, which is the default, produces no output from g2m.pl
.
This is the input:
STRICT digraph graph_10
{
edge ["color" = "green"];
node [shape=rpromoter]
terminator [label = "\nterminator" shape = terminator;];
rpromoter -> terminator [label = Transformer]
}
And this is the output:
root. Attributes: {type => "root_literal", uid => "0", value => "root"}
|--- prolog. Attributes: {type => "prolog_literal", uid => "1", value => "prolog"}
| |--- literal. Attributes: {type => "strict_literal", uid => "3", value => "strict"}
| |--- literal. Attributes: {type => "digraph_literal", uid => "4", value => "digraph"}
|--- graph. Attributes: {type => "graph_literal", uid => "2", value => "graph"}
|--- node_id. Attributes: {type => "node_id", uid => "5", value => "graph_10"}
|--- literal. Attributes: {type => "open_brace", uid => "6", value => "{"}
| |--- class. Attributes: {type => "class", uid => "7", value => "edge"}
| | |--- literal. Attributes: {type => "open_bracket", uid => "8", value => "["}
| | |--- attribute. Attributes: {type => "color", uid => "9", value => "green"}
| | |--- literal. Attributes: {type => "close_bracket", uid => "10", value => "]"}
| |--- class. Attributes: {type => "class", uid => "11", value => "node"}
| | |--- literal. Attributes: {type => "open_bracket", uid => "12", value => "["}
| | |--- attribute. Attributes: {type => "shape", uid => "13", value => "rpromoter"}
| | |--- literal. Attributes: {type => "close_bracket", uid => "14", value => "]"}
| |--- node_id. Attributes: {type => "node_id", uid => "15", value => "terminator"}
| | |--- literal. Attributes: {type => "open_bracket", uid => "16", value => "["}
| | |--- attribute. Attributes: {type => "label", uid => "17", value => "\nterminator"}
| | |--- attribute. Attributes: {type => "shape", uid => "18", value => "terminator"}
| | |--- literal. Attributes: {type => "close_bracket", uid => "19", value => "]"}
| |--- node_id. Attributes: {type => "node_id", uid => "20", value => "rpromoter"}
| |--- edge_id. Attributes: {name => "directed_edge", uid => "21", value => "->"}
| |--- node_id. Attributes: {type => "node_id", uid => "22", value => "terminator"}
| |--- literal. Attributes: {type => "open_bracket", uid => "23", value => "["}
| |--- attribute. Attributes: {type => "label", uid => "24", value => "Transformer"}
| |--- literal. Attributes: {type => "close_bracket", uid => "25", value => "]"}
|--- literal. Attributes: {type => "close_brace", uid => "26", value => "}"}
Parse result: 0 (0 is success)
You can see from this output that words special to Graphviz (e.g. STRICT) are accepted no matter what case they are in. Such tokens are stored in lower-case.
A more detailed analysis follows.
The root
node has 2 daughters:
- o The
prolog
sub-tree -
The
prolog
node is the root of a sub-tree holding everything before the graph's ID, if any.The node is called
prolog
, and its hashref of attributes is{type => "prolog_literal", uid => "1", value => "prolog"}
.It has 1 or 2 daughters. The possibilities are:
- o Input: 'digraph ...'
-
The 1 daughter is named
literal
, and its attributes are{type => "digraph_literal", uid => "3", value => "digraph"}
. - o Input: 'graph ...'
-
The 1 daughter is named
literal
, and its attributes are{type => "graph_literal", uid => "3", value => "graph"}
. - o Input: 'strict digraph ...'
-
The 2 daughters are named
literal
, and their attributes are, respectively,{type => "strict_literal", uid => "3", value => "strict"}
and{type => "digraph_literal", uid => "4", value => "digraph"}
. - o Input: 'strict graph ...'
-
The 2 daughters are named
literal
, and their attributes are, respectively,{type => "strict_literal", uid => "3", value => "strict"'}
and{type => "graph_literal", uid => "4", value => "graph"}
.
And yes, the graph ID, if any, is under the
graph
node. The reason for this is that for every subgraph within the graph, the same structure applies: First the (sub)graph ID, then a literal '{', then that (sub)graph's details, and finally a literal '}'. - o The 'graph' sub-tree
-
The
graph
node is the root of a sub-tree holding everything about the graph, including the graph's ID, if any.The node is called
graph
, and its hashref of attributes is{type => "graph_literal", uid => "2", value => "graph"}
.The
graph
node has as many daughters, with their own daughters, as is necessary to hold the output of parsing the remainder of the input.In particular, if the input graph has an ID, i.e. the input is of the form 'digraph my_id ...' (or various versions thereof) then the 1st daughter will be called
node_id
, and its attributes will be{type => "node_id", uid => "5", value => "my_id"}
.Futher, the 2nd daughter will be called
literal
, and its attributes will be{ype => "open_brace", uid => "6", value => "{"}
. A subsequent daughter will eventually (for a syntax-free input file, of course) also be calledliteral
, and its attributes will be{type => "close_brace", uid => "#", value => "}"}
.Naturally, if the graph has no ID (i.e. input lacks the 'my_id' token) then the uids will differ slightly.
As mentioned, this pattern of optional (sub)graph id followed by a matching pair of '{', '}' nodes, is used for all graphs and subgraphs.
In the case the input contains an explicit
subgraph
, then just before the node representing 'my_id' or '{', there will be another node representing thesubgraph
token.It's name will be
literal
, and its attributes will be{type => "subgraph_literal", uid => "#", value => "subgraph"}
.
How many different names can these nodes have?
The list of possible node names follows. You should always examine the type
and value
keys of the node's attributes to determine the exact nature of the node.
- o attribute
-
In this case, the node's attributes contain a hashref like {type => "arrowhead", uid => "33", value => "odiamond"}, meaning the
type
field holds the type (i.e. name) of the attribute, and the 'value' field holds the value of the attribute. - o class
-
This is used when any of
edge
,graph
, ornode
appear at the start of the (sub)graph, and is the mother of the attributes attached to the class. Thevalue
of the attribute will beedge
,graph
, ornode
.The 1st and last daughters will be literals whose attribute values are '[' and ']' respectively, and the middle daughter(s) will be nodes of type
attribute
(as just discussed). - o edge_id
-
The
value
of the attribute will be either '--' or '->'.Thus the
tail
of the edge will be the previous daughter (node or subgraph), and thehead
of the edge will be the next.Samples are:
n1 -> n2 n1 -> {n2} {n1} -> n2
In a daisy chain of nodes, the last node in the chain may have daughters that are the attributes of each edge in the chain. This is how Graphviz syntax attaches edge attributes to a path. The class
edge
can also be used to provide attributes for the edge. - o graph
-
There is only ever 1 node called
graph
. - o literal
-
literal
is the name of some nodes, with thevalue
key in the attributes having one of these values:- o {
-
Indicates the start of a (sub)graph.
- o }
-
Indicates the end of a (sub)graph.
- o [
-
This indicates the start of a set of attributes for a specific class, edge or node, or the edge attributes at the end of a path.
The 1st and last daughters will be literals whose attribute
value
keys are '[' and ']' respectively.Between these 2 nodes will be 1 node for each attribute, as seen above with
edge ["color" = "green",]
.Note: Graphviz allows an abbreviated syntax for setting the attributes of a (sub)graph. So, instead of needing:
graph [rankdir = LR]
You can just use:
rankdir = LR
In such cases, these attributes are not surrounded by '[' and ']'.
- o ]
-
See the previous point.
- o digraph_literal
- o graph_literal
- o strict_literal
- o subgraph_literal
- o node_id
-
The
value
of the attributes is the name of the graph, a node, or a subgraph.Note: A node name can appear more than once in succession, either as a declaration of the node's existence and then as the tail of an edge, or, as in this fragment of data/56.gv:
node [shape=rpromoter colorscheme=rdbu5 color=1 style=filled fontcolor=3]; Hef1a; TRE; UAS; Hef1aLacOid; Hef1aLacOid [label="Hef1a-LacOid"];
This is a case where tree compression could be done, but isn't done yet.
- o prolog
-
There is only ever 1 node called
prolog
. - o root
-
There is only ever 1 node called
root
.
How are nodes, ports and compass points represented in the (above) tree?
Input contains this fragment of data/16.gv:
node_16_1:p11 -> node_16_2:p22:s
[
arrowhead = "odiamond";
arrowtail = "odot",
color = red
dir = both;
];
The output log contains:
| |--- node_id. Attributes: {type => "node_id", uid => "29", value => "node_16_1:p11"}
| |--- edge_id. Attributes: {name => "directed_edge", uid => "30", value => "->"}
| |--- node_id. Attributes: {type => "node_id", uid => "31", value => "node_16_2:p22:s"}
You can see the ports and compass points have been incorporated into the value
attribute.
How are HTML-like labels handled
The main grammar (See $self -> bnf
in the source) is used to hold the definitions of strings (See strict_literal
). Thus Marpa, via the main parser $self -> recce
, is used to identify all types of strings.
Then, if the string starts with '>', _process_html()
is called, and has a separate grammar (See bnf4html
). This in turn uses a separate grammar object (grammar4html
) and a separate parser (recce4html
). _process_html()
traps any apparent parsing errors, found when lexemes (text) follows the HTML, and saves the label's value. This method also sets $pos to the first char after the HTML, so when control returns to the main parser, and the main grammar, the main parser is not aware of the existence of the HTML, and just keeps on parsing from where the HTML parser finished.
How are comments stored in the tree?
They aren't stored, they are discarded. And this in turn means rendered dot
files can't ever contain them.
What is the homepage of Marpa?
http://savage.net.au/Marpa.html.
That page has a long list of links.
Why do I get error messages like the following?
Error: <stdin>:1: syntax error near line 1
context: digraph >>> Graph <<< {
Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph.
So, don't do this:
strict graph graph{...}
strict graph Graph{...}
strict graph strict{...}
etc...
Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:
strict graph "graph"{...}
Even better, use a more meaningful name for your graph...
The keywords are: node, edge, graph, digraph, subgraph and strict. Compass points are not keywords.
See keywords in the discussion of the syntax of DOT for details.
Does this package support Unicode in the input dot
file?
Yes.
But you are strongly encouraged to put node names using utf8 glyphs in double-quotes, even though it is not always necessary.
See xt/author/data/utf8.*.gv and scripts/test.utf8.sh. In particular, see xt/author/data/utf8.01.gv.
How can I switch from Marpa::XS to Marpa::PP?
Don't use either of them. Use Marpa::R2.
If I input x.old.gv and output x.new.gv, should these 2 files be identical?
Yes - at least in the sense that running dot
on them will produce the same output files. This is assuming the default renderer is used.
See scripts/test.utf8.pl for how to do just that.
As mentioned just above, comments in input files are discarded, so they can never be in the output file.
How are custom graph attributes handled?
They are treated like any other attribute. That is, syntax checking is not performed at that level, but only at the grammatical level. If the construct matches the grammar, this code accepts it.
See data/32.gv.
How are the demo files generated?
See scripts/generate.demo.sh.
How do I run author tests?
This runs both standard and author tests:
shell> perl Build.PL; ./Build; ./Build test; ./Build authortest
There are currently (V 2.00) 91 standard tests, and in xt/author/*.t, 4 pod tests and 355 author tests. Combined, they take almost 2m 30s to run.
See Also
Marpa::Demo::StringParser. The significance of this module is that during the re-write of GraphViz2::Marpa, the string-handling code was perfected in Marpa::Demo::StringParser.
Later, that code was improved within this module, and will be back-ported into Marpa::Demo::StringParser.
Machine-Readable Change Log
The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.
Version Numbers
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Thanks
Many thanks are due to the people who worked on Graphviz.
Jeffrey Kegler wrote Marpa::XS, and has a blog on it at http://blogs.perl.org/users/jeffrey_kegler/.
And thanks to rns (Ruslan Shvedov) for writing the grammar for double-quoted strings used in MarpaX::Demo::SampleScripts's scripts/quoted.strings.02.pl. I adapted it to HTML (see scripts/quoted.strings.05.pl in that module), and then incorporated the grammar into this module. For details, search for bnf4html
, grammar4html
and recce4html
in the source of the current module.
Repository
https://github.com/ronsavage/GraphViz2-Marpa
Support
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa.
Author
GraphViz2::Marpa was written by Ron Savage <ron@savage.net.au> in 2012.
Marpa's homepage: <http://savage.net.au/Marpa.html>.
My homepage: http://savage.net.au/.
Copyright
Australian copyright (c) 2012, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License 2.0, a copy of which is available at:
http://opensource.org/licenses/alphabetical.