NAME

GraphViz2::Marpa::PathUtils - Provide various analyses of Graphviz dot files

SYNOPSIS

All scripts and input and output files listed here are shipped with the distro.

Finding clusters

Either pass parameters in to new():

GraphViz2::Marpa::PathUtils -> new
(
    input_file      => 'data/clusters.in.09.gv',
    output_file     => 'out/clusters.out.09', # Actually a prefix. See FAQ.
    report_clusters => 1,
) -> find_clusters;

Or call methods to set parameters:

my($parser) = GraphViz2::Marpa::PathUtils -> new;

$parser -> input_file('data/clusters.in.09.gv');
$parser -> output_file('out/clusters.out.09'); # Actually a prefix. See FAQ.
$parser -> report_clusters(1);

And then:

$parser -> find_clusters;

Command line usage:

shell> perl scripts/find.clusters.pl -h

Or see scripts/find.clusters.sh, which hard-codes my test data values.

Finding fixed length paths

Either pass parameters in to new():

GraphViz2::Marpa::PathUtils -> new
(
    allow_cycles => 0,
    input_file   => 'data/fixed.length.paths.in.01.gv',
    output_file  => 'out/fixed.length.paths.out.01.gv',
    path_length  => 3,
    report_paths => 1,
    start_node   => 'Act_1',
) -> find_fixed_length_paths;

Or call methods to set parameters;

my($parser) = GraphViz2::Marpa::PathUtils -> new;

$parser -> allow_cycles(0);
$parser -> input_file('data/fixed.length.paths.in.01.gv');
$parser -> output_file('out/fixed.length.paths.in.01.gv');
$parser -> path_length(3);
$parser -> report_paths(1);
$parser -> start_node('Act_1');

And then:

$parser -> find_fixed_length_paths;

Command line usage:

shell> perl scripts/find.fixed.length.paths.pl -h

Or see scripts/fixed.length.paths.sh, which hard-codes my test data values.

DESCRIPTION

GraphViz2::Marpa::PathUtils parses Graphviz dot files and processes the output in various ways.

Features:

o Find all groups of nodes, called clusters

Nodes within such groups are connected to each other, but have no links to nodes outside the cluster.

Graphviz uses 'cluster' is a slightly-different sense. Here, you could say 'sub-tree' instead of 'cluster'.

See "find_clusters()".

o Find all fixed length paths, starting from a given node

This algorithm does not handle edges into or out of subgraphs.

See "find_fixed_length_paths()".

Sample output: http://savage.net.au/Perl-modules/html/graphviz2.marpa.pathutils/index.html.

This class is a descendent of GraphViz2::Marpa, and hence inherits all its keys to new(), and all its methods.

Scripts shipped with this distro

All scripts are in the scripts/ directory. This means they do not get installed along with the package.

Input data files are in data/, output data files are in out/, and html and svg files are in html/.

o copy.config.pl

This is only for use by the author.

o find.clusters.pl

Try shell> perl find.clusters.pl -h

This runs the "find_clusters()" method.

o find.clusters.sh

This runs find.clusters.pl with hard-coded parameters, and is what I use for testing new code.

o find.fixed.length.paths.pl

This runs the "find_fixed_length_paths()" method.

Try shell> perl find.fixed.length.paths.pl -h

o find.fixed.length.paths.sh

This runs find.fixed.length.paths.pl with hard-coded parameters, and is what I use for testing new code.

o generate.demo.pl

Uses the Text::Xslate template htdocs/assets/templates/graphviz2/marpa/pathutils/pathutils.tx to generate html/index.html.

o generate.demo.sh

Runs generate.demo.pl.

Then copies html/*.html and html/*.svg to my web server's dir, $DR/Perl-modules/html/graphviz2.marpa.pathutils/.

o pod2html.sh

Converts all *.pm files to *.html, and copies them in my web server's dir structure.

See also t/test.t.

Distributions

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.

Installation

Install GraphViz2::Marpa::PathUtils as you would for any Perl module:

Run:

cpanm GraphViz2::Marpa::PathUtils

or run:

sudo cpan GraphViz2::Marpa::PathUtils

or unpack the distro, and then either:

perl Build.PL
./Build
./Build test
sudo ./Build install

or:

perl Makefile.PL
make (or dmake or nmake)
make test
make install

Constructor and Initialization

Calling new()

new() is called as my($obj) = GraphViz2::Marpa::PathUtils -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type GraphViz2::Marpa::PathUtils.

This class is a descendent of GraphViz2::Marpa, and hence inherits all its parameters.

In particular, see "Constructor and Initialization" in GraphViz2::Marpa for these parameters:

o description
o input_file
o logger
o maxlevel
o minlevel
o output_file

See the "FAQ" for the 2 interpretations of this parameter.

Further, these key-value pairs are accepted in the parameter list (see corresponding methods for details [e.g. "path_length($integer)"]):

o allow_cycles => $integer

Specify whether or not cycles are allowed in the paths found.

Values for $integer:

o 0 - Do not allow any cycles

This is the default.

o 1 - Allow any node to be included once or twice.

Note: allow_cycles => 1 is not implemented yet in V 2.

Default: 0.

This option is only used when calling "find_fixed_length_paths()".

o path_length => $integer

Specify the length of all paths to be included in the output.

Here, length means the number of edges between nodes.

Default: 0.

This parameter is mandatory, and must be > 0.

This option is only used when calling "find_fixed_length_paths()".

o report_clusters => $Boolean

Specify whether or not to print a report of the clusters found.

Default: 0 (do not print).

This option is only used when calling "find_clusters()".

o report_paths => $Boolean

Specify whether or not to print a report of the paths found.

Default: 0 (do not print).

This option is only used when calling "find_fixed_length_paths()".

o start_node => $theNameOfANode

Specify the name of the node where all paths must start from.

Default: ''.

This parameter is mandatory.

The name can be the empty string, but must not be undef.

This option is only used when calling "find_fixed_length_paths()".

Methods

This class is a descendent of GraphViz2::Marpa, and hence inherits all its methods.

In particular, see "Methods" in GraphViz2::Marpa for these methods:

o description()
o input_file()
o logger()
o maxlevel()
o minlevel()
o output_file()

See the "FAQ" for the 2 interpretations of this method.

Further, these methods are implemented.

allow_cycles([$integer])

Here the [] indicate an optional parameter.

Get or set the value determining whether or not cycles are allowed in the paths found.

'allow_cycles' is a parameter to "new()". See "Constructor and Initialization" for details.

Note: allow_cycles => 1 is not implemented yet in V 2.

cluster_sets()

Returns a hashref. The keys are integers and the values are sets of type Set::Tiny.

Each set contains the nodes of each independent cluster within the input file. Here, independent means there are no edges joining one cluster to another.

Each set is turned into a tree during a call to "find_clusters()". You can retrieve these trees by calling "cluster_trees()".

Both "find_clusters()" and "find_fixed_length_paths()" populate this hashref, but the former adds stand-alone nodes because they are clusters in their own right.

So why doesn't "find_fixed_length_paths()" add stand-alone nodes? Because you can't have a path of length 0, so stand-alone nodes cannot participate in the search for fixed length paths.

cluster_trees()

Returns a hashref. The keys are integers and the values are Tree::DAG_Node trees.

This is only populated by "find_clusters()".

The input comes from calling "cluster_sets()", so the output is one tree per cluster. This means there is 1 tree per set returned by "cluster_sets()".

find_clusters()

This is one of the 2 methods which do all the work, and hence must be called. The other is "find_fixed_length_paths()".

The program ignores port and compass suffixes on node names. See http://savage.net.au/Perl-modules/html/graphviz2.marpa.pathutils/clusters.in.10.html for a sample.

See the "Synopsis" and scripts/find.clusters.pl.

Returns 0 for success and 1 for failure.

find_fixed_length_paths()

This is one of the 2 methods which do all the work, and hence must be called. The other is "find_clusters()".

Notes:

o Edges pointing in to or out of subgraphs

The code does not handle these cases, which will be difficult to implement.

o Edge tails or heads using ports and compasses

Just specify the start node without the port+compass details.

See the "Synopsis" and scripts/find.fixed.length.paths.pl.

Returns 0 for success and 1 for failure.

fixed_path_set()

Returns the arrayref of paths found. Each element is 1 path, and paths are stored as an arrayref of objects of type Tree::DAG_Node.

See the source code of "output_fixed_length_gv()", or "report_fixed_length_paths()", for sample usage.

new()

See "Constructor and Initialization" for details on the parameters accepted by "new()".

output_clusters()

Output a set of files, one per cluster, if new() was called as new(output_file => '...').

output_fixed_length_gv($tree, $title)

Output a DOT file for given $tree, with the given $title, if new() was called as new(output_file => '...').

path_length([$integer])

Here the [] indicate an optional parameter.

Get or set the length of the paths to be searched for.

'path_length' is a parameter to "new()". See "Constructor and Initialization" for details.

report_cluster_members()

This prints the clusters found, if new() was called as new(report_clusters => 1).

report_clusters([$Boolean])

Here the [] indicate an optional parameter.

Get or set the option which determines whether or not the clusters found are printed.

'report_clusters' is a parameter to "new()". See "Constructor and Initialization" for details.

report_fixed_length_paths()

This prints the fixed length paths found, if new() was called as new(report_paths => 1).

report_paths([$Boolean])

Here the [] indicate an optional parameter.

Get or set the option which determines whether or not the fixed length paths found are printed.

'report_paths' is a parameter to "new()". See "Constructor and Initialization" for details.

start_node([$string])

Here the [] indicate an optional parameter.

Get or set the name of the node from where all paths must start.

Specify the start node without the port+compass details.

'start_node' is a parameter to "new()". See "Constructor and Initialization" for details.

FAQ

How does find_fixed_length_paths() handle port+compass detail?

Just specify the start node without the port+compass details, and it will work.

Check GraphViz2::Marpa::PathUtils::Demo line 82. The corresponding data is data/fixed.length.paths.in.04.gv, out/* and html/* files.

I can't get this module to work with Path::Tiny objects

If your input and output file names are Path::Tiny objects, do this:

new(input_file => "$input_file", output_file => "$output_file", ...)

What are the 2 interpretations of output_file?

This section discusses input/output DOT file naming. HTML file names follow the same system.

o For clusters

Each cluster's DOT syntax is written to a different file. So output_file must be the prefix of these output files.

Hence, in scripts/find.clusters.sh, GV2=clusters.out.$1 (using $1 = '09', say) means output clusters' DOT files will be called clusters.out.09.001.gv up to clusters.out.09.007.gv.

So, the prefix has ".$n.gv" attached as a suffix, for clusters 1 .. N.

o For fixed length paths

Since there is only ever 1 DOT file output here, output_file is the name of that file.

Hence, in scripts/find.fixed.length.path.sh, the value supplied is out/fixed.length.paths.out.$FILE.gv, where $FILE will be '01', '02' or '03'. So the output file, using $FILE='01', will be out/fixed.length.paths.out.01.gv.

I used a node label of "\N" and now your module doesn't work!

The most likely explanation is that you're calling find_fixed_path_lengths() and you've specified all nodes, or at least some, to have a label like "\N".

This escape sequence triggers special processing in AT&T's Graphviz to generate labels for nodes, overriding the code I use to re-name nodes in the output of find_fixed_path_lengths().

See _prepare_fixed_length_output() for the gory details.

The purpose of my re-numbering code is to allow a node to appear in the output multiple times but to stop Graphviz automatically regarding all such references to be the same node. Giving a node different names (which are un-seen) but the same label (which is seen) makes Graphviz think they are really different nodes.

The 3 samples in part 2 of the demo page should make this issue clear.

What is the homepages of Graphviz and Marpa?

Graphviz. Marpa.

See also Jeffrey's annotated blog.

How are clusters named?

They are simply counted in the order discovered in the input file.

How are cycles in fixed path length analysis handled?

Note: allow_cycles => 1 is not implemented yet in V 2.

This is controlled by the allow_cycles option to new(), or the corresponding method "allow_cycles($integer)".

The code keeps track of the number of times each node is entered. If new(allow_cycles => 0) was called, nodes are only considered if they are entered once. If new(allow_cycles => 1) was called, nodes are also considered if they are entered a second time.

Sample code: Using the input file data/fixed.length.paths.in.01.gv we can specify various combinations of parameters like this:

allow_cycles  path_length  start node  solutions
0             3            Act_1       4
1             3            Act_1       ?

0             4            Act_1       5
1             4            Act_1       ?

Are all (fixed length) paths found unique?

Yes, as long as they are unique in the input. Something like this produces 8 identical solutions (starting from A, of path length 3) because each node B, C, D, can be entered in 2 different ways, and 2**3 = 8.

digraph G
{
    A -> B -> C -> D;
    A -> B -> C -> D;
}

See data/fixed.length.paths.in.03.gv and html/fixed.length.paths.out.03.svg.

The number of options is confusing

Agreed. Remember that this code calls GraphViz2::Marpa's run() method, which expects a large number of options.

Why do I get error messages like the following?

Error: <stdin>:1: syntax error near line 1
context: digraph >>>  Graph <<<  {

Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph.

The keywords are: digraph, edge, graph, node, strict and subgraph. Compass points are not keywords.

See keywords in the discussion of the syntax of DOT for details.

So, don't do this:

strict graph graph{...}
strict graph Graph{...}
strict graph strict{...}
etc...

Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:

strict graph "graph"{...}

Even better, use a more meaningful name for your graph...

TODO

Problems:

o Fixed length paths and subgraphs

If the edge is pointing to a subgraph, all nodes in that subgraph are heads of the edge.

Likewise, if the start node is inside a subgraph, the edge can be outside the subgraph.

o Re-implement allow_cycles()

This has been ignored in the re-write for V 2.

Search the source for 'TODO' and 'allow_cycles' to find where patches must be made.

References

Combinatorial Algorithms for Computers and Calculators, A Nijenhuis and H Wilf, p 240.

This books very clearly explains the backtracking parser I used to process the combinations of nodes found at each point along each path. Source code in the book is in Fortran.

The book is available as a PDF from http://www.math.upenn.edu/~wilf/website/CombAlgDownld.html.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Machine-Readable Change Log

The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.

Repository

https://github.com/ronsavage/GraphViz2-Marpa-PathUtils

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa::PathUtils.

Author

GraphViz2::Marpa::PathUtils was written by Ron Savage <ron@savage.net.au> in 2012.

Home page: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2012, Ron Savage.

All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html