Mark is a graph advocate and field engineer for Neo Technology, the company behind the Neo4j graph database. As a field engineer, Mark helps customers embrace graph data and Neo4j building sophisticated solutions to challenging data problems. When he's not with customers Mark is a developer on Neo4j and writes his experiences of being a graphista on a popular blog at http://markhneedham.com/blog. He tweets at @markhneedham. Mark is a DZone MVB and is not an employee of DZone and has posted 536 posts at DZone. You can read more from them at their website. View Full User Profile

Algorithm of the Week: Kruskal’s Algorithm using union find in Ruby

01.29.2013
| 4980 views |
  • submit to reddit
I recently wrote a blog post describing my implementation of Kruskal’s algorithm – a greedy algorithm using to find a minimum spanning tree (MST) of a graph – and while it does the job it’s not particularly quick.

It takes 20 seconds to calculate the MST for a 500 node, ~2000 edge graph.

One way that we can improve the performance of the algorithm is by storing the MST in a union find/disjoint set data structure.

To refresh, Kruskal’s algorithm does the following:

  • sort edges E in order of increasing cost
  • let T = minimum spanning tree, m = number of edges
  • for i=1 to m:
    • let e = selected edge (u,v)
    • if there’s no way to go between u and v using edges in T:
      • add e to T
  • return T

In the first version of the algorithm we stored the MST in an array and then we did a depth first search to check that adding an edge to the array wasn’t going to create a cycle which would mean we no longer had an MST.

As I understand it the way that we use the union find data structure is that we start out with n connected components i.e. every node is in its own connected component.

We then merge these components together as we come across new edges which connect nodes in different components until we only have one component left at which point we should have our MST.

  • sort edges E in order of increasing cost
  • initialise union-find uf with n connected components
  • let T = minimum spanning tree, m = number of edges
  • for i=1 to m:
    • let e = selected edge (u,v)
    • if u and v not in same connected component in uf:
      • merge u and v into same connected component in uf
      • add e to T
  • return T

I came across this bit of code written by Michael Luckeneder which implements the union find data structure in Ruby and adapted that slightly to fit the nomenclature used in the Algorithms 2 videos.

The union find data structure looks like this:

class UnionFind
  def initialize(n)
    @leaders = 1.upto(n).inject([]) { |leaders, i| leaders[i] = i; leaders }
  end
 
  def connected?(id1,id2)
    @leaders[id1] == @leaders[id2]
  end
 
  def union(id1,id2)
    leader_1, leader_2 = @leaders[id1], @leaders[id2]
    @leaders.map! {|i| (i == leader_1) ? leader_2 : i }
  end
end

We have two methods:

  • connected? which we use to check whether or not two nodes are in the same connected component.
  • union which we use to put two nodes into the same connected component.

The way it’s implemented is that we have a collection of ‘leaders’ and initially each node is its own leader. As we find edges which belong in the MST we call union passing the two nodes as arguments. After this method executes every node which has the same leader as the first node will now instead have the same leader as the second node.

For example if we have a simple graph with edges 1 -> 2 -> 3 -> 4 -> 5 our initial union find data structure would look like this:

> uf = UnionFind.new 5
=> #<UnionFind:0x45e5a9b3 @leaders=[nil, 1, 2, 3, 4, 5]>

At this stage each node is in its own connected component but if we process the edge 1 -> 2 we’d first check if those two nodes were already in the same connected component:

> uf.connected?(1,2)
=> false

Given that they aren’t we can call union:

> uf.union(1,2)
=> [nil, 2, 2, 3, 4, 5]

As we can see from the output nodes 1 & 2 now both have the same leader since they are in the same connected component while the other nodes still remain on their own.

We could then process the 2 -> 3 edge which would put nodes 1, 2 & 3 together:

> uf.union(2,3)
=> [nil, 3, 3, 3, 4, 5]

The outline for Kruskal’s algorithm which makes use of this data structure is like so:

set = UnionFind.new number_of_nodes
 
@minimum_spanning_tree = []
 
edges = file.drop(1).
        map { |x| x.gsub(/\n/, "").split(" ").map(&:to_i) }.
        map { |one, two, weight| { :from => one, :to => two, :weight => weight}}.
        sort_by { |x| x[:weight]}
 
edges.each do |edge|
  if !set.connected?(edge[:from], edge[:to])
    @minimum_spanning_tree << edge 
    set.union(edge[:from], edge[:to])
  end  
end
 
puts "MST: #{@minimum_spanning_tree}"
puts "Cost: #{@minimum_spanning_tree.inject(0) { |acc, x| acc + x[:weight]}}"

This version of the algorithm runs in 1.9 seconds, a significant improvement on the initial version. The full code is on github as usual!

Published at DZone with permission of Mark Needham, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)