In this tutorial, I’ll be walking you through the code-implementation of Kruskal’s Algorithm.

To restate myself:

First, let’s get a few `typedef`

s and function prototypes out of the way:

```
#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
/* Define an edge struct to handle edges more easily.*/
struct edge {
int first, second, weight;
};
/* Needed to typedef to reduce keystrokes or copy/paste */
typedef std::vector< std::vector< std::pair<int,int> > > adj_list;
std::vector<edge> kruskal(const adj_list& graph);
int get_pred(int vertex, const std::vector<int>& pred);
```

Like in my Bellman Ford program, we will use an **adjacency list** to to manage graph data.

```
typedef std::vector< std::vector< std::pair<int,int> > > adj_list
```

The above simply reduces the keystrokes necessary and allows our code to be more readable and expressive (for us dumb humans, that is).

Next, you’ll see two function declarations:

```
std::vector<edge> kruskal(const adj_list& graph);
int get_pred(int vertex, const std::vector<int>& pred);
```

The former will contain the actual algorithm, and the latter is a helper/subroutine (fancy word!). `kruskal(...)`

will return a `vector`

of `edge`

s (see struct declaration above), because we are looking for all the *edges* in the **Minimum Spanning Tree**.

# Input

```
int main() {
int n,m; std::cin >> n >> m;
adj_list graph(n);
int f,s,w;
while (m-- > 0) {
std::cin >> f >> s >> w;
if (f == s) continue; /* avoid loops */
graph[ f-1 ].push_back( std::make_pair( s-1 , w ) );
}
//...
}
```

`n`

is the number of nodes, or vertices, and `m`

is the number of edges. We declare an `adj_list`

called `graph`

and then proceed to fill it up with the `m`

edges. We’re going to treat our graph as if it were directed, because redundant edges (used in an undirected graph) would only serve as clutter (here).

# Find MST

The next step is to find the MST:

```
int main() {
//...
std::vector<edge> result = kruskal(graph);
std::cout << "Here is the minimal tree:\n";
for (auto& _edge : result) {
std::cout << char(_edge.first+65) << " connects to " << char(_edge.second+65) << std::endl;
}
return 0;
}
```

Above is the rest of `main()`

. I’m tossing it there now so you can see our expectations. Now, we’ll jump into the `kruskal(...)`

function.

# Kruskal Function

```
std::vector<edge> kruskal(const adj_list& graph) {
std::vector<edge> edges, minimum_spanning_tree;
/*
`pred` will represent our Disjointed sets by naming a set head.
In the beginning, each node is its own head in its own set.
We merge sets in the while loop.
*/
std::vector<int> pred(graph.size());
for (int i = 0, n = graph.size(); i < n; i++) {
for (auto& _edge : graph[i])
edges.push_back( { i, _edge.first, _edge.second } );
pred[i] = i;
}
//...
}
```

First, we declare two `edge`

`vector`

s, `edges`

to hold all of the edges, and `minimum_spanning_tree`

to be returned as the result.

We also declare an `int`

`vector`

called `pred`

, which we do not need to worry about just yet. For now, just know that `pred`

helps us keep track of what is already in the MST, or a subset of the MST.

In the `for`

loop, we populate our `edges`

array/`vector`

. Additionally, `pred[i] = i`

assures that each vertex belongs to its own disjoint set. At this point of the graph, each vertex is its own subset of the final MST (yet to be discovered). Each of these subsets currently has no edges, so no subset is connected to any other subset. Next, we have to find the lowest-weight/least-costly edges that will connect all the subsets *without cycles*.

```
std::vector<edge> kruskal(const adj_list& graph) {
//...
/*
Let's reverse-sort our edge vector
so that we can just pop off the last (smallest)
element.
*/
auto comp = [&](edge left, edge right) { return left.weight > right.weight; };
std::sort(edges.begin(), edges.end(), comp);
//...
}
```

The next step is to sort the edges in descending order. We want the smallest edge to be at the end of the array so that we can just pop (`std::vector::pop_back()`

) it off.

```
std::vector<edge> kruskal(const adj_list& graph) {
//...
while( !edges.empty() ) {
/* get shortest/least-heavy edge */
edge shortest = edges.back();
edges.pop_back();
int f_head,s_head; /* first_head, second... */
f_head = get_pred(shortest.first, pred);
s_head = get_pred(shortest.second, pred);
/*
If the nodes making up a certain edge are
not already in the same set...
*/
if (f_head != s_head) {
/* Add that edge to the Min. Span. Tree*/
minimum_spanning_tree.push_back(shortest);
/*
Merge the sets by setting
the head of one set to the head
of the other set.
If the head of one set is A and the other is C,
as long as we point C to A, all nodes part of the
set once headed by C will find A in linear time.
*/
if (f_head < s_head)
pred[s_head] = f_head;
else
pred[f_head] = s_head;
}
}
return minimum_spanning_tree;
}
```

Now the real fun happens! We’re going to pop off the shortest edge from the `edges`

array (as promised) and see if we need it in our MST. The logic is rather trivial:

- If the two vertices connected by a given edge belong to the same
*disjoint set*, the edge would create a*cycle*and is, thus, unneeded. - If the two vertices are members of two different
*disjoint sets*, the edge should be added to the MST, since it is the smallest edge connecting the two particular vertices (because we sorted the edges). The sets of the vertices are*joined*.

Let’s walk through the process of joining two sets. `pred`

keeps track of which set a vertex belongs to by pointing to a vertex’s predecessor, a vertex in the same set, which was added to the set before this vertex.

Here’s an example. If we join the sets of vertex `F`

and vertex `C`

, we need to find the set to which each belongs *first* and join those. If vertex `F`

belongs to the set headed by `A`

, and vertex `C`

belongs to the set headed by `D`

, we take set `D`

and tack it on to `A`

. We can’t just add `C`

to `A`

, because there might be other vertexes attached to `D`

that would get lost, and this would cause extra edges to be added or failure in finding a correct MST. To solve this, we just add `D`

to `A`

, and then all the vertexes with `D`

as their head will point to `D`

and then follow `D`

to `A`

.

To better understand this, let’s take a look at `get_pred(...)`

:

```
int get_pred(int vertex, const std::vector<int>& pred) {
/*
We stop when a node/vertex is its own predecessor.
This means we have found the head of the set.
*/
while(pred[vertex] != vertex)
vertex = pred[vertex];
return vertex;
}
```

It is looking for the vertex with itself as it’s predecessor. This means that this particular vertex is the head of its own set. If we join the head of a set to another set, all vertices under a particular head will find their way to the new head in the next iteration of the big `while`

loop in `kruskal(...)`

, if necessary.

The algorithm is done when all the vertexes belong to the same set. If we run out of edges to evaluate before this happens, then one or more vertices is unreachable, meaning that we haven’t found the MST, but a *minimum spanning forest* instead, a subset of the undiscoverable MST.

# Conclusion

I know that this concept of *disjoint sets* may cause some confusion, but rest assured that it will come to you soon. This video will be helpful to you (it’s different from the one I showed you in Part 3.0.

Also, note that *there can be multiple correct MSTs in a particular graph.*

That’s it for this tutorial! I’m happy to answer any questions, whether below or on IRC (##0x00sec on freenode). Additionally, if you have any concerns or suggestions relating to how I can better explain some concepts/techniques, PM me.

Full source:

Later…

@oaktree