NAME

Graph::Dijkstra - Dijkstra's shortest path algorithm with methods to input/output graph datasets using supported file formats

SYNOPSIS

 # create the object
 use Graph::Dijkstra;
 my $graph = Graph::Dijkstra->new();

 # input graph from a supported file format
 $graph->inputGraphfromGML('astro-ph.gml');
 $graph->inputGraphfromCSV('another.csv');
 $graph->inputGraphfromJSON('another.json');
 $graph->inputGraphfromGraphML('graphML.xml', {keyEdgeValueID => 'weight', keyNodeLabelID => 'name'} );
 
 #SET methods to create graph nodes and edges "manually"
 $graph->node( 0, 'nodeA');
 $graph->node( 1, 'nodeB');
 $graph->node( 2, 'nodeC');
 $graph->edge(0, 1, 3);  #create or update an edge between source and target;  cost(dist) must be > 0
 $graph->edge(1, 2, 2);
 $graph->removeNode( 0 ); #removes the node with ID = 0 and all associated edges; returns "self" or undef if node not found
 $graph->removeEdge( 0, 1 ); #removes the edge that connects the two nodes; returns "self" or undef if either node or the edge not found
 
 #GET methods for graph nodes and edges
 $graph->node( 0 ); #returns the label associated with node ID '0' or undef if there is no node with that id
 $graph->nodeExists( 0 );  #returns true if a node with an ID value of '0' has been defined; false if not.
 $graph->edge( 0, 1 ); #returns the cost (distance) associated with the edge between node's 0 and 1; undef if there is no edge
 $graph->edgeExists( 0, 1 );  #returns true if an edge between nodes with id values of '0' and '1' has been defined; false if not.
 $graph->nodeList();  #returns 2 dimensional array of all nodes, each node (array) element contains ID and LABEL values
 $graph->adjacent( 0, 1 ); #returns true or false if there is an edge between nodeA and nodeB
 $graph->adjacentNodes( 0 ); #returns a list of node IDs of immediately connected nodes (edges)
 
 #methods to output the internal graph to a supported file format
 $graph->outputGraphtoGML('mygraphfile.gml', 'creator name');
 $graph->outputGraphtoCSV('mygraphfile.csv');
 $graph->outputGraphtoJSON('mygraphfile.json');
 $graph->outputGraphtoGraphML('mygraphfile.xml', {keyEdgeWeightID => 'weight',keyEdgeWeightAttrName => 'weight', keyNodeLabelID => 'name', keyNodeLabelID => 'name'});
 
 #Dijkstra shortest path methods
 
 use Data::Dumper;
 my %Solution = ();
 
 #shortest path to farthest node from origin node
 if (my $solutionCost = $graph->farthestNode( 0, \%Solution )) {
 	print Dumper(\%Solution);
 }
 
 #compute shortest path between two nodes (from origin to destination)
 if (my $pathCost = $graph->shortestPath( 0, 2, \%Solution ) ) {
 	print Dumper(\%Solution);
 }
 

DESCRIPTION

Efficient implementation of Dijkstras shortest path algorithm in Perl using a Minimum Priority Queue (Array::Heap::ModifiablePriorityQueue**).

Includes methods to input and output graph datasets from/to the following file formats.

GML (Graph Modelling Language, not to be confused with Geospatial Markup Language), 
JSON Graph Specification (latest draft specification, edge weights/costs input using metadata "value" attribute), 
GraphML (XML based specification), and
CSV (a simple row column format modelled after GML developed by author).

At this time, Graph::Dijkstra supports undirected graphs only. All edges are treated as bi-directional (undirected). The inputGraphfrom[file format] methods croak if they encounter a graph dataset with directed edges.

In this initial release, the internal graph data model is minimal. Nodes (vertices) contain an ID value (simple scalar), a label (string), and a list (hash) of edges (the ID values of connected nodes). Edges contain the ID values of the target and source nodes and the numeric cost (value/distance/amount). The edge cost must be a positive number (integer or real).

The outputGraphto[file format] methods output data elements from the internal graph. If converting between two supported formats (eg., GML and GraphML), unsupported attributes from the input file (which are not saved in the internal graph) are *not* be written to the output file. Later releases will extend the internal graph data model.

This initial release has not been sufficiently tested with real-world graph data sets. It can handle rather large datasets (tens of thousands of nodes, hundreds of thousands of edges).

If you encounter a problem or have suggested improvements, please email the author and include a sample dataset. If providing a sample data set, please scrub it of any sensitive or confidential data.

**Array::Heap::ModifiablePriorityQueue, written in Perl, uses Array::Heap, an xs module.

PLATFORM DEPENDENCIES

Graph::Dijkstra uses XML::LibXML to input and output GraphML format files. At this time, XML::LibXML is not available on the ActiveState PPM repository for Windows x86 or x64. Author has emailed ActiveState requesting that they make it available. Until such time as XML::LibXML is available on the ActiveState PPM repository for Windows, this module will not build on the ActiveState PPM repository for Windows.

Conversely, this module was developed using ActiveState Perl 5.20.2 for Windows (x86) using the XML::LibXML PPM installed from the Bribes PPM repository. It should be possible for ActiveState Windows users to install Graph::Dijkstra as follows.

Install XML::LibXML from the Bribes PPM repository http://www.bribes.org/perl/ppmdir.html using the following command.

	ppm install http://www.bribes.org/perl/ppm/XML-LibXML.ppd

Then install the following from the ActiveState PPM repository as normal (using the PPM install command).

	Array::Heap::ModifiablePriorityQueue
	Regexp::Common
	Scalar::Util
	HTML::Entities

You should then be able to install this module directly from CPAN using the CPAN command.

METHODS

Graph::Dijkstra->VERBOSE( $bool );

Class method that turns on or off informational output to STDOUT.

my $graph = Graph::Dijsktra->new();

Create a new, empty graph object. Returns the object on success.

Node methods

$graph->node( $id, $label );

SET method that adds new or updates existing node and returns self. Node ID values must be simple scalars.

$graph->node( $id );

GET method that returns the label associated with the node ID or returns undef if the node ID does not exist. Note: nodes may have blank ('') labels. Use nodeExists method to test for existance.

$graph->nodeExists( $id );

GET method that returns true if a node with that ID values exists or false if not.

my @list = $graph->nodeList();

Returns unsorted, 2 dimensional array (list of lists) of all nodes in the graph. Each list element includes a node ID value (element 0) and LABEL value (element 1). $list[0][0] is the ID value of the first node and $list[0][1] is the LABEL value of the first node.

$graph->removeNode( $id );

Removes node identified by $id and all connecting edges and returns self. Returns undef if $id does not exist.

Edge methods

$graph->edge( $sourceID, $targetID, $cost );

SET method that creates new or updates existing edge between $sourceID and $targetID and returns $self. $cost must be > 0. Returns undef if $cost <= 0 or if $sourceID or $targetID do not exist.

$graph->edge( $sourceID, $targetID );

GET method that returns existing cost (distance) of edge between $sourceID and $targetID. Returns 0 if there is no edge between $sourceID and $targetID. Returns undef if $sourceID or $targetID do not exist.

$graph->edgeExists( $sourceID, $targetID );

GET method that returns true if an edge connects the source and target IDs or false if an edge has not been defined.

$graph->adjacent( $sourceID, $targetID );

GET method that returns true if an edge connects $sourceID and $targetID or false if not. Returns undef if $sourceID or $targetID do not exist.

my @list = $graph->adjacentNodes( $id );

Returns unsorted list of node IDs connected to node $id by an edge. Returns undef if $id does not exist.

$graph->removeEdge( $sourceID, $targetID );

Removes edge between $source and $target (and $target and $source) and returns self. Returns undef if $source or $target do not exist.

Dijkstra computation methods

my $solutioncost = $graph->farthestNode( $originID, $solutionHref );

Returns the cost of the shortest path to the farthest node from the origin. Carps and returns 0 if $originID does not exist. When there is more than one solution (two or more farthest nodes from the origin with the same cost), the solution hash includes multiple "path" elements, one for each solution.

Sample code

my %Solution = ();
if ( my $solutionCost = $graph->farthestNode('I', \%Solution) ) {
	print Dumper(\%Solution);
	foreach my $i (1 .. $Solution{count}) {
		print "From origin $Solution{origin}, solution path ($i) to farthest node $Solution{path}{$i}{destination} at cost $Solution{cost}\n";
		foreach my $edgeHref (@{$Solution{path}{$i}{edges}}) {
			print "\tsource='$edgeHref->{source}' target='$edgeHref->{target}' cost='$edgeHref->{cost}'\n";
		}
	}
}

Produces the following output

	$VAR1 = {
          'cost' => 18,
          'origin' => 'I',
          'desc' => 'farthest',
          'count' => 2,
          'path' => {
                      '1' => {
                               'destination' => 'A',
                               'edges' => [
                                            {
                                              'source' => 'I',
                                              'target' => 'L',
                                              'cost' => 4
                                            },
                                            {
                                              'cost' => 6,
                                              'target' => 'H',
                                              'source' => 'L'
                                            },
                                            {
                                              'source' => 'H',
                                              'target' => 'D',
                                              'cost' => 5
                                            },
                                            {
                                              'cost' => 3,
                                              'target' => 'A',
                                              'source' => 'D'
                                            }
                                          ]
                             },
                      '2' => {
                               'destination' => 'C',
                               'edges' => [
                                            {
                                              'source' => 'I',
                                              'target' => 'J',
                                              'cost' => 2
                                            },
                                            {
                                              'cost' => 9,
                                              'target' => 'K',
                                              'source' => 'J'
                                            },
                                            {
                                              'target' => 'G',
                                              'source' => 'K',
                                              'cost' => 2
                                            },
                                            {
                                              'cost' => 5,
                                              'source' => 'G',
                                              'target' => 'C'
                                            }
                                          ]
                             }
                    }
        };

	From origin I, solution path (1) to farthest node A at cost 18
		source='I' target='L' cost='4'
		source='L' target='H' cost='6'
		source='H' target='D' cost='5'
		source='D' target='A' cost='3'
	From origin I, solution path (2) to farthest node C at cost 18
		source='I' target='J' cost='2'
		source='J' target='K' cost='9'
		source='K' target='G' cost='2'
		source='G' target='C' cost='5'
my $solutioncost = $graph->shortestPath( $originID, $destinationID, $solutionHref );

Returns cost of shortest path between $originID and $destinationID or 0 if there is no path between $originID and $destinationID. Carps if $originID or $destinationID do not exist. Populates $solutionHref (hash reference) with origin, destination, cost, and shortest path edges.

Sample code

my %Solution = ();
if ( my $pathCost = $graph->shortestPath('I','A', \%Solution) ) {
	print Dumper(\%Solution);
	print "Solution path from origin $Solution{origin} to destination $Solution{destination} at cost $Solution{cost}\n";
	foreach my $edgeHref (@{$Solution{edges}}) {
		print "\tsource='$edgeHref->{source}' target='$edgeHref->{target}' cost='$edgeHref->{cost}'\n";
	}
}

Produces the following output

	$VAR1 = {
          'destination' => 'A',
          'cost' => 18,
          'desc' => 'path',
          'origin' => 'I',
          'edges' => [
                       {
                         'cost' => 4,
                         'source' => 'I',
                         'target' => 'L'
                       },
                       {
                         'target' => 'H',
                         'source' => 'L',
                         'cost' => 6
                       },
                       {
                         'target' => 'D',
                         'cost' => 5,
                         'source' => 'H'
                       },
                       {
                         'cost' => 3,
                         'source' => 'D',
                         'target' => 'A'
                       }
                     ]
        };
        
	Solution path from origin I to destination A at cost 18
		source='I' target='L' cost='4'
		source='L' target='H' cost='6'
		source='H' target='D' cost='5'
		source='D' target='A' cost='3'

Input graph methods

$graph->inputGraphfromJSON($filename);

Inputs nodes and edges from a JSON format file following the draft JSON Graph Specification. Edge values/weights/costs are input using the metadata "value" attribute.

Example edge that includes metadata value attribute per JSON Graph Specification. { "source" : "1", "target" : "2", "metadata" : { "value" : "0.5" } },

See JSON Graph Specification https://www.npmjs.com/package/json-graph-specification

$graph->inputGraphfromGML($filename);

Inputs nodes and edges from a Graphics Modelling Language format file (not to be confused with the Geospatial Markup Language XML format). Implemented using pattern matching (regexp's) on "node" and "edge" constructs. An unmatched closing bracket (']') inside a quoted string attribute value will break the pattern matching. Quoted string attribute values (e.g., a label value) should not normally include an unmatched closing bracket. Report as a bug and I'll work on re-implementing using a parser.

See Graph Modelling Language https://en.wikipedia.org/wiki/Graph_Modelling_Language

$graph->inputGraphfromCSV($filename);

Inputs nodes and edges from a CSV format file loosely modelled after GML. The first column in each "row" is either a "node" or "edge" value. For "node" rows, the next two columns are the ID and LABEL values. For "edge" rows, the next three columns are "source", "target", and "value" values. No column header row.

Example

node,A,"one"
node,B,"two"
node,C,"three"
node,D,"four"
node,E,"five"
node,F,"six"
edge,A,B,4
edge,A,F,5
edge,A,E,7
edge,A,D,3
edge,D,E,5
$graph->inputGraphfromGraphML($filename, {keyEdgeValueID => 'weight', keyNodeLabelID => 'name'} );

Inputs nodes and edges from an XML format file following the GraphML specification. EXPERIMENTAL... has not been tested with real world data sets.

Input files must contain only a single graph and cannot contain embedded graphs. Hyperedges are not supported.

The options hash reference (second parameter following the filename) is used to provide the key element ID values for edge weight/value/cost/distance and node label/name/description.

If either is not provided, the method will search the key elements for (1) edge attributes (for="edge") with an attr.name value of weight, value, cost, or distance; and (2) node attributes (for="node") with an attr.name value of label, name, description, or nlabel.

Graphs must contain a "key" attribute for edges that identifies the edge weight/value/cost/distance such as for="edge" attrib.name="weight". If this key element includes a child element that specifies a default value, that default value will be used to populate the weight (cost/value/distance) for each edge node that does not include a weight/value/cost/distance data element. Seems odd to specify a default edge weight but it will be accepted.

  <key id="d1" for="edge" attr.name="weight" attr.type="double">
    <default>2.2</default>
  </key>

	<edge id="7" source="1" target="2">
		<data key="weight">0.5</data>
	</edge>

Graphs should contain a "key" attribute for nodes that identifies the node label / name / description such as for="node" attrib.name="name" or for="node" attrib.name="label". These are used to populate the internal graph "label" value for each node. If not included, the internal node labels will be empty strings.

<key id="name" for="node" attr.name="name" attr.type="string"/>

<node id="4">
	<data key="name">josh</data>
</node>

See GraphML Primer http://graphml.graphdrawing.org/primer/graphml-primer.html and GraphML example http://gephi.org/users/supported-graph-formats/graphml-format/

Output graph methods

$graph->outputGraphtoGML($filename, $creator);

Using the internal graph, outputs a file in GML format. Includes a "creator" entry on the first line of the file with a date and timestamp. Note that non-numeric node IDs and label values are HTML encoded.

$graph->outputGraphtoJSON($filename);

Using the internal graph, outputs a file following the JSON graph specification. See the inputGraphfromJSON method for format details.

$graph->outputGraphtoCSV($filename);

Using the internal graph, outputs a file in CSV format. See the inputGraphfromCSV method for format details.

$graph->outputGraphtoGraphML($filename, {keyEdgeWeightID => '',keyEdgeWeightAttrName => '', keyNodeLabelID => '', keyNodeLabelID => ''} );

Using the internal graph, outputs a file in XML format following the GraphML specification (schema). The option attributes keyEdgeWeightID and keyEdgeWeightAttrName both default to 'weight'. keyNodeLabelID and keyNodeLabelID both default to 'name'. Set these attributes values only if you need to output different values for these key attributes.

PERFORMANCE

On an unremarkable laptop, with a test graph of 16+K nodes and 121+K edges, both farthest and shortestPath complete in under 1 second. Reading files from disk takes longer than computing the shortest path to the farthest node.

LIMITATIONS

Node ID values must be simple scalars.

Currently only works with undirected graphs.

For simplicity, InputGraphfromGML implemented using pattern matching (regexp's). An unmatched closing bracket inside a quoted string (value) will break it. Will re-implement using a parser as necessary.

TODOs

Support input and output of addditional data attributes for nodes and edges in a comprehensive manner. For example, add support for edge IDs and labels. Update $graph->inputGraphfrom*, $graph->outputGraphto*, $graph->node*, and $graph->edge* methods.

Support input, Dijkstra path computations, and output of directed graphs (graphs with directed / uni-directional edges) in a comprehensive manner.

Add computation methods.

Support additional graph file formats.

Input welcome. Please email author with suggestions. Graph data sets (purged of sensitive/confidential data) are welcome and appreciated.

SEE ALSO

Array::Heap::ModifiablePriorityQueue

Graph Modelling Language https://en.wikipedia.org/wiki/Graph_Modelling_Language

JSON Graph Specification https://www.npmjs.com/package/json-graph-specification

GraphML Primer http://graphml.graphdrawing.org/primer/graphml-primer.html

GraphML example http://gephi.org/users/supported-graph-formats/graphml-format/

AUTHOR

D. Dewey Allen <ddallen16@gmail.com>

COPYRIGHT/LICENSE

Copyright (C) 2015, D. Dewey Allen

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.