NAME
GraphViz2::Marpa::PathUtils - Provide various analyses of Graphviz dot files
SYNOPSIS
All scripts and input and output files listed here are shipped with the distro.
Finding clusters
Either pass parameters in to new():
GraphViz2::Marpa::PathUtils -> new
(
input_file => 'data/clusters.in.09.gv',
output_file => 'out/clusters.out.09', # Actually a prefix. See FAQ.
report_clusters => 1,
) -> find_clusters;
Or call methods to set parameters:
my($parser) = GraphViz2::Marpa::PathUtils -> new;
$parser -> input_file('data/clusters.in.09.gv');
$parser -> output_file('out/clusters.out.09'); # Actually a prefix. See FAQ.
$parser -> report_clusters(1);
And then:
$parser -> find_clusters;
Command line usage:
shell> perl scripts/find.clusters.pl -h
Or see scripts/find.clusters.sh, which hard-codes my test data values.
Finding fixed length paths
Either pass parameters in to new():
GraphViz2::Marpa::PathUtils -> new
(
allow_cycles => 0,
input_file => 'data/fixed.length.paths.in.01.gv',
output_file => 'out/fixed.length.paths.out.01.gv',
path_length => 3,
report_paths => 1,
start_node => 'Act_1',
) -> find_fixed_length_paths;
Or call methods to set parameters;
my($parser) = GraphViz2::Marpa::PathUtils -> new;
$parser -> allow_cycles(0);
$parser -> input_file('data/fixed.length.paths.in.01.gv');
$parser -> output_file('out/fixed.length.paths.in.01.gv');
$parser -> path_length(3);
$parser -> report_paths(1);
$parser -> start_node('Act_1');
And then:
$parser -> find_fixed_length_paths;
Command line usage:
shell> perl scripts/find.fixed.length.paths.pl -h
Or see scripts/fixed.length.paths.sh, which hard-codes my test data values.
DESCRIPTION
GraphViz2::Marpa::PathUtils parses Graphviz dot files and processes the output in various ways.
Features:
- o Find all groups of nodes, called clusters
-
Nodes within such groups are connected to each other, but have no links to nodes outside the cluster.
Graphviz uses 'cluster' is a slightly-different sense. Here, you could say 'sub-tree' instead of 'cluster'.
See "find_clusters()".
- o Find all fixed length paths, starting from a given node
-
This algorithm does not handle edges into or out of subgraphs.
Sample output: http://savage.net.au/Perl-modules/html/graphviz2.marpa.pathutils/index.html.
This class is a descendent of GraphViz2::Marpa, and hence inherits all its keys to new(), and all its methods.
Scripts shipped with this distro
All scripts are in the scripts/ directory. This means they do not get installed along with the package.
Input data files are in data/, output data files are in out/, and html and svg files are in html/.
- o copy.config.pl
-
This is only for use by the author.
- o find.clusters.pl
-
Try shell> perl find.clusters.pl -h
This runs the "find_clusters()" method.
- o find.clusters.sh
-
This runs find.clusters.pl with hard-coded parameters, and is what I use for testing new code.
- o find.fixed.length.paths.pl
-
This runs the "find_fixed_length_paths()" method.
Try shell> perl find.fixed.length.paths.pl -h
- o find.fixed.length.paths.sh
-
This runs find.fixed.length.paths.pl with hard-coded parameters, and is what I use for testing new code.
- o generate.demo.pl
-
Uses the Text::Xslate template htdocs/assets/templates/graphviz2/marpa/pathutils/pathutils.tx to generate html/index.html.
- o generate.demo.sh
-
Runs generate.demo.pl.
Then copies html/*.html and html/*.svg to my web server's dir, $DR/Perl-modules/html/graphviz2.marpa.pathutils/.
- o pod2html.sh
-
Converts all *.pm files to *.html, and copies them in my web server's dir structure.
See also t/test.t.
Distributions
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.
Installation
Install GraphViz2::Marpa::PathUtils as you would for any Perl
module:
Run:
cpanm GraphViz2::Marpa::PathUtils
or run:
sudo cpan GraphViz2::Marpa::PathUtils
or unpack the distro, and then either:
perl Build.PL
./Build
./Build test
sudo ./Build install
or:
perl Makefile.PL
make (or dmake or nmake)
make test
make install
Constructor and Initialization
Calling new()
new()
is called as my($obj) = GraphViz2::Marpa::PathUtils -> new(k1 => v1, k2 => v2, ...)
.
It returns a new object of type GraphViz2::Marpa::PathUtils
.
This class is a descendent of GraphViz2::Marpa, and hence inherits all its parameters.
In particular, see "Constructor and Initialization" in GraphViz2::Marpa for these parameters:
- o description
- o input_file
- o logger
- o maxlevel
- o minlevel
- o output_file
-
See the "FAQ" for the 2 interpretations of this parameter.
Further, these key-value pairs are accepted in the parameter list (see corresponding methods for details [e.g. "path_length($integer)"]):
- o allow_cycles => $integer
-
Specify whether or not cycles are allowed in the paths found.
Values for $integer:
- o 0 - Do not allow any cycles
-
This is the default.
- o 1 - Allow any node to be included once or twice.
-
Note: allow_cycles => 1 is not implemented yet in V 2.
Default: 0.
This option is only used when calling "find_fixed_length_paths()".
- o path_length => $integer
-
Specify the length of all paths to be included in the output.
Here, length means the number of edges between nodes.
Default: 0.
This parameter is mandatory, and must be > 0.
This option is only used when calling "find_fixed_length_paths()".
- o report_clusters => $Boolean
-
Specify whether or not to print a report of the clusters found.
Default: 0 (do not print).
This option is only used when calling "find_clusters()".
- o report_paths => $Boolean
-
Specify whether or not to print a report of the paths found.
Default: 0 (do not print).
This option is only used when calling "find_fixed_length_paths()".
- o start_node => $theNameOfANode
-
Specify the name of the node where all paths must start from.
Default: ''.
This parameter is mandatory.
The name can be the empty string, but must not be undef.
This option is only used when calling "find_fixed_length_paths()".
Methods
This class is a descendent of GraphViz2::Marpa, and hence inherits all its methods.
In particular, see "Methods" in GraphViz2::Marpa for these methods:
- o description()
- o input_file()
- o logger()
- o maxlevel()
- o minlevel()
- o output_file()
-
See the "FAQ" for the 2 interpretations of this method.
Further, these methods are implemented.
allow_cycles([$integer])
Here the [] indicate an optional parameter.
Get or set the value determining whether or not cycles are allowed in the paths found.
'allow_cycles' is a parameter to "new()". See "Constructor and Initialization" for details.
Note: allow_cycles => 1 is not implemented yet in V 2.
cluster_sets()
Returns a hashref. The keys are integers and the values are sets of type Set::Tiny.
Each set contains the nodes of each independent cluster within the input file. Here, independent means there are no edges joining one cluster to another.
Each set is turned into a tree during a call to "find_clusters()". You can retrieve these trees by calling "cluster_trees()".
Both "find_clusters()" and "find_fixed_length_paths()" populate this hashref, but the former adds stand-alone nodes because they are clusters in their own right.
So why doesn't "find_fixed_length_paths()" add stand-alone nodes? Because you can't have a path of length 0, so stand-alone nodes cannot participate in the search for fixed length paths.
cluster_trees()
Returns a hashref. The keys are integers and the values are Tree::DAG_Node trees.
This is only populated by "find_clusters()".
The input comes from calling "cluster_sets()", so the output is one tree per cluster. This means there is 1 tree per set returned by "cluster_sets()".
find_clusters()
This is one of the 2 methods which do all the work, and hence must be called. The other is "find_fixed_length_paths()".
The program ignores port and compass suffixes on node names. See http://savage.net.au/Perl-modules/html/graphviz2.marpa.pathutils/clusters.in.10.html for a sample.
See the "Synopsis" and scripts/find.clusters.pl.
Returns 0 for success and 1 for failure.
find_fixed_length_paths()
This is one of the 2 methods which do all the work, and hence must be called. The other is "find_clusters()".
Notes:
- o Edges pointing in to or out of subgraphs
-
The code does not handle these cases, which will be difficult to implement.
- o Edge tails or heads using ports and compasses
-
Just specify the start node without the port+compass details.
See the "Synopsis" and scripts/find.fixed.length.paths.pl.
Returns 0 for success and 1 for failure.
fixed_path_set()
Returns the arrayref of paths found. Each element is 1 path, and paths are stored as an arrayref of objects of type Tree::DAG_Node.
See the source code of "output_fixed_length_gv()", or "report_fixed_length_paths()", for sample usage.
new()
See "Constructor and Initialization" for details on the parameters accepted by "new()".
output_clusters()
Output a set of files, one per cluster, if new() was called as new(output_file => '...').
output_fixed_length_gv($tree, $title)
Output a DOT file for given $tree, with the given $title, if new() was called as new(output_file => '...').
path_length([$integer])
Here the [] indicate an optional parameter.
Get or set the length of the paths to be searched for.
'path_length' is a parameter to "new()". See "Constructor and Initialization" for details.
report_cluster_members()
This prints the clusters found, if new() was called as new(report_clusters => 1).
report_clusters([$Boolean])
Here the [] indicate an optional parameter.
Get or set the option which determines whether or not the clusters found are printed.
'report_clusters' is a parameter to "new()". See "Constructor and Initialization" for details.
report_fixed_length_paths()
This prints the fixed length paths found, if new() was called as new(report_paths => 1).
report_paths([$Boolean])
Here the [] indicate an optional parameter.
Get or set the option which determines whether or not the fixed length paths found are printed.
'report_paths' is a parameter to "new()". See "Constructor and Initialization" for details.
start_node([$string])
Here the [] indicate an optional parameter.
Get or set the name of the node from where all paths must start.
Specify the start node without the port+compass details.
'start_node' is a parameter to "new()". See "Constructor and Initialization" for details.
FAQ
How does find_fixed_length_paths() handle port+compass detail?
Just specify the start node without the port+compass details, and it will work.
Check GraphViz2::Marpa::PathUtils::Demo line 82. The corresponding data is data/fixed.length.paths.in.04.gv, out/* and html/* files.
I can't get this module to work with Path::Tiny objects
If your input and output file names are Path::Tiny objects, do this:
new(input_file => "$input_file", output_file => "$output_file", ...)
What are the 2 interpretations of output_file
?
This section discusses input/output DOT file naming. HTML file names follow the same system.
- o For clusters
-
Each cluster's DOT syntax is written to a different file. So
output_file
must be the prefix of these output files.Hence, in scripts/find.clusters.sh,
GV2=clusters.out.$1
(using $1 = '09', say) means output clusters' DOT files will be calledclusters.out.09.001.gv
up toclusters.out.09.007.gv
.So, the prefix has ".$n.gv" attached as a suffix, for clusters 1 .. N.
- o For fixed length paths
-
Since there is only ever 1 DOT file output here,
output_file
is the name of that file.Hence, in scripts/find.fixed.length.path.sh, the value supplied is
out/fixed.length.paths.out.$FILE.gv
, where $FILE will be '01', '02' or '03'. So the output file, using $FILE='01', will beout/fixed.length.paths.out.01.gv
.
I used a node label of "\N" and now your module doesn't work!
The most likely explanation is that you're calling find_fixed_path_lengths() and you've specified all nodes, or at least some, to have a label like "\N".
This escape sequence triggers special processing in AT&T's Graphviz to generate labels for nodes, overriding the code I use to re-name nodes in the output of find_fixed_path_lengths().
See _prepare_fixed_length_output() for the gory details.
The purpose of my re-numbering code is to allow a node to appear in the output multiple times but to stop Graphviz automatically regarding all such references to be the same node. Giving a node different names (which are un-seen) but the same label (which is seen) makes Graphviz think they are really different nodes.
The 3 samples in part 2 of the demo page should make this issue clear.
What is the homepages of Graphviz and Marpa?
See also Jeffrey's annotated blog.
How are clusters named?
They are simply counted in the order discovered in the input file.
How are cycles in fixed path length analysis handled?
Note: allow_cycles => 1 is not implemented yet in V 2.
This is controlled by the allow_cycles option to new(), or the corresponding method "allow_cycles($integer)".
The code keeps track of the number of times each node is entered. If new(allow_cycles => 0) was called, nodes are only considered if they are entered once. If new(allow_cycles => 1) was called, nodes are also considered if they are entered a second time.
Sample code: Using the input file data/fixed.length.paths.in.01.gv we can specify various combinations of parameters like this:
allow_cycles path_length start node solutions
0 3 Act_1 4
1 3 Act_1 ?
0 4 Act_1 5
1 4 Act_1 ?
Are all (fixed length) paths found unique?
Yes, as long as they are unique in the input. Something like this produces 8 identical solutions (starting from A, of path length 3) because each node B, C, D, can be entered in 2 different ways, and 2**3 = 8.
digraph G
{
A -> B -> C -> D;
A -> B -> C -> D;
}
See data/fixed.length.paths.in.03.gv and html/fixed.length.paths.out.03.svg.
The number of options is confusing
Agreed. Remember that this code calls GraphViz2::Marpa's run() method, which expects a large number of options.
Why do I get error messages like the following?
Error: <stdin>:1: syntax error near line 1
context: digraph >>> Graph <<< {
Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph.
The keywords are: digraph, edge, graph, node, strict and subgraph. Compass points are not keywords.
See keywords in the discussion of the syntax of DOT for details.
So, don't do this:
strict graph graph{...}
strict graph Graph{...}
strict graph strict{...}
etc...
Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:
strict graph "graph"{...}
Even better, use a more meaningful name for your graph...
TODO
Problems:
- o Fixed length paths and subgraphs
-
If the edge is pointing to a subgraph, all nodes in that subgraph are heads of the edge.
Likewise, if the start node is inside a subgraph, the edge can be outside the subgraph.
- o Re-implement allow_cycles()
-
This has been ignored in the re-write for V 2.
Search the source for 'TODO' and 'allow_cycles' to find where patches must be made.
References
Combinatorial Algorithms for Computers and Calculators, A Nijenhuis and H Wilf, p 240.
This books very clearly explains the backtracking parser I used to process the combinations of nodes found at each point along each path. Source code in the book is in Fortran.
The book is available as a PDF from http://www.math.upenn.edu/~wilf/website/CombAlgDownld.html.
Version Numbers
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Machine-Readable Change Log
The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.
Repository
https://github.com/ronsavage/GraphViz2-Marpa-PathUtils
Support
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa::PathUtils.
Author
GraphViz2::Marpa::PathUtils was written by Ron Savage <ron@savage.net.au> in 2012.
Home page: http://savage.net.au/index.html.
Copyright
Australian copyright (c) 2012, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html