In this tutorial, I’ll be walking you through the code-implementation of Kruskal’s Algorithm.
To restate myself:
First, let’s get a few typedef
s and function prototypes out of the way:
#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
/* Define an edge struct to handle edges more easily.*/
struct edge {
int first, second, weight;
};
/* Needed to typedef to reduce keystrokes or copy/paste */
typedef std::vector< std::vector< std::pair<int,int> > > adj_list;
std::vector<edge> kruskal(const adj_list& graph);
int get_pred(int vertex, const std::vector<int>& pred);
Like in my Bellman Ford program, we will use an adjacency list to to manage graph data.
typedef std::vector< std::vector< std::pair<int,int> > > adj_list
The above simply reduces the keystrokes necessary and allows our code to be more readable and expressive (for us dumb humans, that is).
Next, you’ll see two function declarations:
std::vector<edge> kruskal(const adj_list& graph);
int get_pred(int vertex, const std::vector<int>& pred);
The former will contain the actual algorithm, and the latter is a helper/subroutine (fancy word!). kruskal(...)
will return a vector
of edge
s (see struct declaration above), because we are looking for all the edges in the Minimum Spanning Tree.
Input
int main() {
int n,m; std::cin >> n >> m;
adj_list graph(n);
int f,s,w;
while (m-- > 0) {
std::cin >> f >> s >> w;
if (f == s) continue; /* avoid loops */
graph[ f-1 ].push_back( std::make_pair( s-1 , w ) );
}
//...
}
n
is the number of nodes, or vertices, and m
is the number of edges. We declare an adj_list
called graph
and then proceed to fill it up with the m
edges. We’re going to treat our graph as if it were directed, because redundant edges (used in an undirected graph) would only serve as clutter (here).
Find MST
The next step is to find the MST:
int main() {
//...
std::vector<edge> result = kruskal(graph);
std::cout << "Here is the minimal tree:\n";
for (auto& _edge : result) {
std::cout << char(_edge.first+65) << " connects to " << char(_edge.second+65) << std::endl;
}
return 0;
}
Above is the rest of main()
. I’m tossing it there now so you can see our expectations. Now, we’ll jump into the kruskal(...)
function.
Kruskal Function
std::vector<edge> kruskal(const adj_list& graph) {
std::vector<edge> edges, minimum_spanning_tree;
/*
`pred` will represent our Disjointed sets by naming a set head.
In the beginning, each node is its own head in its own set.
We merge sets in the while loop.
*/
std::vector<int> pred(graph.size());
for (int i = 0, n = graph.size(); i < n; i++) {
for (auto& _edge : graph[i])
edges.push_back( { i, _edge.first, _edge.second } );
pred[i] = i;
}
//...
}
First, we declare two edge
vector
s, edges
to hold all of the edges, and minimum_spanning_tree
to be returned as the result.
We also declare an int
vector
called pred
, which we do not need to worry about just yet. For now, just know that pred
helps us keep track of what is already in the MST, or a subset of the MST.
In the for
loop, we populate our edges
array/vector
. Additionally, pred[i] = i
assures that each vertex belongs to its own disjoint set. At this point of the graph, each vertex is its own subset of the final MST (yet to be discovered). Each of these subsets currently has no edges, so no subset is connected to any other subset. Next, we have to find the lowest-weight/least-costly edges that will connect all the subsets without cycles.
std::vector<edge> kruskal(const adj_list& graph) {
//...
/*
Let's reverse-sort our edge vector
so that we can just pop off the last (smallest)
element.
*/
auto comp = [&](edge left, edge right) { return left.weight > right.weight; };
std::sort(edges.begin(), edges.end(), comp);
//...
}
The next step is to sort the edges in descending order. We want the smallest edge to be at the end of the array so that we can just pop (std::vector::pop_back()
) it off.
std::vector<edge> kruskal(const adj_list& graph) {
//...
while( !edges.empty() ) {
/* get shortest/least-heavy edge */
edge shortest = edges.back();
edges.pop_back();
int f_head,s_head; /* first_head, second... */
f_head = get_pred(shortest.first, pred);
s_head = get_pred(shortest.second, pred);
/*
If the nodes making up a certain edge are
not already in the same set...
*/
if (f_head != s_head) {
/* Add that edge to the Min. Span. Tree*/
minimum_spanning_tree.push_back(shortest);
/*
Merge the sets by setting
the head of one set to the head
of the other set.
If the head of one set is A and the other is C,
as long as we point C to A, all nodes part of the
set once headed by C will find A in linear time.
*/
if (f_head < s_head)
pred[s_head] = f_head;
else
pred[f_head] = s_head;
}
}
return minimum_spanning_tree;
}
Now the real fun happens! We’re going to pop off the shortest edge from the edges
array (as promised) and see if we need it in our MST. The logic is rather trivial:
- If the two vertices connected by a given edge belong to the same disjoint set, the edge would create a cycle and is, thus, unneeded.
- If the two vertices are members of two different disjoint sets, the edge should be added to the MST, since it is the smallest edge connecting the two particular vertices (because we sorted the edges). The sets of the vertices are joined.
Let’s walk through the process of joining two sets. pred
keeps track of which set a vertex belongs to by pointing to a vertex’s predecessor, a vertex in the same set, which was added to the set before this vertex.
Here’s an example. If we join the sets of vertex F
and vertex C
, we need to find the set to which each belongs first and join those. If vertex F
belongs to the set headed by A
, and vertex C
belongs to the set headed by D
, we take set D
and tack it on to A
. We can’t just add C
to A
, because there might be other vertexes attached to D
that would get lost, and this would cause extra edges to be added or failure in finding a correct MST. To solve this, we just add D
to A
, and then all the vertexes with D
as their head will point to D
and then follow D
to A
.
To better understand this, let’s take a look at get_pred(...)
:
int get_pred(int vertex, const std::vector<int>& pred) {
/*
We stop when a node/vertex is its own predecessor.
This means we have found the head of the set.
*/
while(pred[vertex] != vertex)
vertex = pred[vertex];
return vertex;
}
It is looking for the vertex with itself as it’s predecessor. This means that this particular vertex is the head of its own set. If we join the head of a set to another set, all vertices under a particular head will find their way to the new head in the next iteration of the big while
loop in kruskal(...)
, if necessary.
The algorithm is done when all the vertexes belong to the same set. If we run out of edges to evaluate before this happens, then one or more vertices is unreachable, meaning that we haven’t found the MST, but a minimum spanning forest instead, a subset of the undiscoverable MST.
Conclusion
I know that this concept of disjoint sets may cause some confusion, but rest assured that it will come to you soon. This video will be helpful to you (it’s different from the one I showed you in Part 3.0.
Also, note that there can be multiple correct MSTs in a particular graph.
That’s it for this tutorial! I’m happy to answer any questions, whether below or on IRC (#0x00sec on freenode). Additionally, if you have any concerns or suggestions relating to how I can better explain some concepts/techniques, PM me.
Full source:
Later…
@oaktree