Let me begin by telling you what a topological ordering of a directed graph is. Essentially, it's an ordering of the vertices of a graph so that all of the arcs, the directed edges of the graph, only go forward in the ordering. So let me encode an ordering by a labeling of the vertices with the numbers one through n. This is just to encode the position of each vertex in this ordering. So formally there's going to be a function which takes vertices of G and maps them to integers between 1 and n. Each of the numbers 1 through n should be taken on by exactly one vertex. Here n is the number of vertices of G. So that's just a way to encode an ordering, and then here's really the important property that every directed edge of G goes forward in the ordering. That is if u v is directed edge of the directed graph G, then it should be that the f value of the tail is less than the f value of the head. That is this directed edge has the higher f value as traverse in the correct direction. Let me give you an example just to make this more clear. So suppose we have this very simple directed graph, with four vertices. Let me show you two different, totally legitimate topological orderings of this graph. So the first thing you could do, is you could label s1, v2, w3 and t4. Another option would be to label them the same way, except you can swap the labels of v and w. So if you want, you can label v 3 and w 2. So again, what these labelings are really meant to encode is an ordering of the vertices. So the blue labeling, you can think of as encoding the ordering in which we put s first then v then w and then t. Whereas the green labeling can be thought of as the same ordering of the nodes except with w coming before v. What's important is that the pattern of the edge is exactly the same in both cases, and in particular all of the edges go forward in this ordering. So in either case we have s with edges from s to v, and s to w. So that looks the same way pictorially, whichever order v or w are in, and then symmetrically there are edges from v and w to t. So you'll notice that no matter which order that we put v and w in, all four of these edges go forward in each of these orderings. Now if you try to put v before s it wouldn't work because the edge from s to v would be going backward if v preceded s. Similarly, if you put t anywhere other than the final position, you would not have a topological ordering. So in fact, these are the only two topological orderings of this directed graph. I encourage you to convince yourself of that. Now, who cares about topological orderings? Well, this is actually a very useful subroutine. This has been come up in all kinds of applications. Really, whenever you want to sequence a bunch of tasks when there's precedent constraints among them. By precedence constraint I mean one task has to be finished before another. You can think, for example, about the courses in some kind of undergraduate major like a computer science major. Here the vertices are going to correspond to all of the course and there's a directed edge from course A to course B if course A is a prerequisite for course B, if you have to take it first. So then of course, you'd like to know a sequence in which you can take these courses so that you always take a course after you've taken its pre-requisite. And that's exactly what a topological ordering will accomplish. So it's reasonable to ask the question when does a directed graph have a topological ordering and when a graph does have such an ordering, how do we get our grubby little hands on it? Well there's a very clear necessary condition for a graph to have a topological ordering, which is it had better be a cyclic. Put differently, if a directed graph has a directed cycle then there is certainly no way there is going to be a topological ordering. So I hope the reason for this is fairly clear. Consider any directed graph which does have a directed cycle and consider any purported way of ordering the vertices. Well, now just traverse the edges of the cycle, one by one. So you start somewhere of the cycle, and if the first edge goes backward, well, you're already screwed. You already know that this ordering is not topological. No edges can go backward. So evidently, the first edge of this cycle has to go forward. But now, you have to traverse the rest of the edges on this cycle, and eventually you come back to where you started. So if you started out by going forward, at some point you have to go backward. So that edge goes backward in the ordering, violating the property of the topological ordering. That's true for every ordering, so directed cycles exclude the possibility of topological ordering. Now the question is what if you don't have a cycle? Is that a strong enough condition that you're guaranteed to have a topological ordering? Is the only obstruction to sequencing jobs without conflicts the obvious one of having circular precedence constraints? So it turns out not only is the answer yes, as long as you don't have any directed cycles, you're guaranteed a topological ordering. But we can even compute on in linear time no less via depth-first search. So before I show you the super slick and super efficient reduction of computing topological orderings of depth-first search. Let me first show over a pretty good but slightly less slick and slightly less efficient solution to help build up your intuition about directed acyclic graphs and their topological orderings. So for the straightforward solution, we're going to begin with a simple observation. Every directed acyclic graph has what I'm going to call a sink vertex. That is a vertex without any outgoing arcs. So in the forenode, directed acyclic graph we were exploring on the last slide there's exactly one source of sink vertex, and that's this right-most vertex here. That has no outgoing arcs, the other three vertices all have at least one outgoing arc. Now why is it the case that a directed acyclic graph has to have a sink vertex? Well, suppose it didn't, suppose it had no sink vertex. That would mean every single vertex has at least one outgoing arc. So what could we do if every vertex has one outgoing arc? Well, we can start in an arbitrary node. We know it's not a sink vertex, because we're assuming there aren't any. So there's an outgoing arc, so let's follow it. We get to some other node. By assumption there's no sink vertex, so this isn't a sink vertex, so there's an outgoing arc, so let's follow it. We get to some other node. That also has an outgoing arc, let's follow that, and so on. So we just keep following outgoing arcs, and we do this as long as we want because every vertex has at least one outgoing arc. Well there's a finite number of vertices, right this graph has say N vertices. So if we follow N arcs, we are going to see N+1 vertices. So by the pigeon-hole principle, we're going to have to see a repeat. Right so, if N+1 vertices, is only N distinct vertices, we're going to see some vertex twice. So for example, maybe after I take the outgoing arc from this vertex, I get back to this one that I saw previously. Well, what have we done? What happens when we get a repeated vertex? By tracing these outgoing arcs and repeating a vertex, we have exhibited a directed cycle. And that's exactly what we're assuming doesn't exist. We're talking about directed acyclic graphs. So, put differently, we just prove that a vertex with no sink vertex has to have a directed cycle. So a directed acyclic graph therefore has to have at least one sink vertex. So here's how we use this one very simple observation now to compute a topological ordering of a directed acyclic graph. Well let's do a little thought experiment. Suppose in fact this graph did have a topological order. Let's think about the vertex which goes last in this topological ordering. Remember, any arc which goes backward in the ordering is a violation. So we have to avoid that. We have to make sure every arc goes forward in the ordering. Now any vertex which has an outgoing arc, we better put somewhere other than in the final position, right? So the node that we put in the final position, all of its outgoing arcs are going to wind up going backward in the topological ordering. There's no where else they can go, this vertex is last. So in other words, if we plan to successfully compute a topological ordering, the only candidate vertices for that final position in the ordering are the sink vertices. That's all that's going to work. We put a non-sink vertex there, we're toast, it's not going to happen. Fortunately, if it's directed acyclic, we know there is a sink vertex. So let v be a sink vertex of g, if there's many sink vertices, we pick one arbitrarily. We set v's label to be the maximum possible, so if there's n vertices we're going to put that in the nth position. And now we just recurse on the rest of the graph, which has only n-1 vertices. So how would this work in the example on the right? Well in the first iteration, or the first outermost recursive call, the only sink vertex is this right most one, circled in green. So there's four vertices, so we're going to give that the label 4. So, then having labeled that 4, we delete that vertex and all the edges incident to it. And we recurse on what's left of the graph, so that would be the left-most three vertices plus the left-most two edges. Now, this graph has two sink vertices, after we've deleted 4 and everything from it. So both this top vertex and this bottom vertex are sinks in the residual graph. So now in the next recursive call, we can choose either of those as our sink vertex. Because we have two choices, that generates two topological orderings. Those are exactly the ones that we saw in the example. But if, for example, we choose this one to be our sink vertex, then that gets the label 3. Then we recurse just on the northwestern most two edges. This vertex is the unique sink in that graph, that gets the label 2. And then it recurs on the one node that we graph, and that gets the label 1. So, why is this algorithm work? Well, there's just two quick observations we need. So, first of all, we need to argue that it makes sense that in every iteration or in every recursive call, we can indeed find the sink vertex, that we can assign in the final position that's still unfilled. And the reason for that is just if you take a directed acyclic graph and you delete one or more vertices from it, you're still going to have a directed acyclic graph, right? You can't create cycles by just getting rid of stuff. You can only destroy cycles, and we started with no cycles. So through all the intermediate recursive calls we have no cycles by our first observation is always the sink. So the second that that we have to argue is that we really do produce a topological ordering. So remember what that means, that means for every edge of the graph, it goes forward in the ordering. That is the head of the arc is given a position later than the tail of the arc. And this simply follows because we always use sink vertices. So consider the vertex v which is assigned to the position i. This means then, that when we're down to a graph that only has i vertices remaining, v is the sink vertex. If v is the sink vertex when only the first i vertices remain, what property does it have in the original graph? Well, it means all of outgoing arcs that it has have to go to vertices that were already deleted and assigned higher positions. So for every vertex, by the time it actually gets assigned a position, it's a sink and it only has incoming arcs from the as yet unsigned vertices. It's outgoing arcs all go forward to vertices that were already assigned higher positions, and got deleted previously from the graph. So now we have under our belt a pretty reasonable solution for computing a topological ordering of a directed acyclic graph. In particular, remember we observed that if a graph does have a directed cycle, then of course there's no way there's a topological ordering. However, you order the vertices summage of the side that's going to have to go backward. And the solution on the previous slide shows that, as long as you don't have a cycle, it guarantees the topological ordering does indeed exist. And in fact, it's a constructive proof, constructive argument, that gives an algorithm. What you do is you would keep plucking off sinks, sink vertices one at a time, and populating the ordering from right to left, as you keep peeling off these sinks. So that's a pretty good algorithm, it's not too slow, and actually if you implement it just so, you can even get it to run in linear time. But I want to conclude this video with an application of depth first search, which is a very slick, very efficient computation of a topological ordering of a directed acyclic graph. So, we're just going to make two really quite minor modifications to our previous depth first search subroutine. The first thing is we have to embed it in a for loop, just like we did with breadth first search when we were computing the connected components of an undirected graph. That's because, in completing a topological ordering, we better give every single vertex a label, we'd better look at every vertex at least once. So to do that, we'll just make sure there's an outer for loop and then if we have multiple components, we'll just make sure to invoke DFS as often as we need to. The second thing we'll do is we'll add a little bit of bookkeeping, and this will make sure that every node gets a label. And in fact, these labels will define a topological order. So let's not forget the code for depth first search. This is where you're given a graph, G, in this case we're interested in a directed acyclic graph, and you're given a start vertex, S. And what you do is, as soon as you get to S you very aggressively start trying to explore its neighbors. Of course, you don't visit any vertex you've already been to. You keep track of who you visited. And if you find any vertex that you haven't seen before, you immediately start recursing on that node. So I said the first modification we need is to embed this into an outer for loop to ensure that every single node gets labeled. So I'm going to call that subroutine DFS-Loop. It does not take a start vertex. Initialization, all nodes start out on an explorative course. And we're also going to keep track of a global variable, which I'll call current_label. This is going to be initialized to n, and we're going to count down each time we finish exploring a new node. And these will be precisely the f values. These will be exactly the positions of the vertices in the topological ordering that we output. In the main loop we're going to iterate over all of the nodes of the graph. So for example, we just do a scan through the node array. As usual, we don't want to do any work twice, so if a vertex has already been explored in some previous invocation of DFS, we don't search from it. This should all be familiar from our embedding of breadth first search in a for loop when we computed the connected components of an undirected graph. And if we get to a vertex v of the graph, that we haven't explored yet, then we just invoke DFS in the graph, with that vertex as the starting point. So, the final thing I need to add is I need to tell you what the f values are, what the actual assignments of vertices to positions are. And as I foreshadowed, we're going to use this global current_label variable. And that'll have us assign vertices to positions from right to the left. Very much mimicking what was going on in our recursive solution, where we plucked off sink vertices one at a time. So, when's the right time to assign a vertex its position? Well, it turns out, the right time is when we've completely finished with that vertex. So we're about to pop the recursive call from the stack corresponding to that vertex. So, after we've gone through the for loop of all the edges outgoing from a given vertex, we set f (s) = to whatever the current_label is and then we decrement the current_label. And that's it, that is the entire algorithm. So, the claim is going to be that the f values produced, which you'll notice are going to be the integers between n through 1, because DFS will be called eventually once on every vertex, and it will get some integer assignment at the end. And everybody is going to get a distinct value, and the largest one is n and the smallest one is 1. The claim is that is a topological ordering. Clearly this algorithm is just as blazingly fast as DFS itself, with just a trivial amount of extra bookkeeping. Lets see how it works on our running example. So lets just say we have this four node directed graph that we're getting quite used to. So this has four vertices, so we initialize the current label variable to be equal to 4. So let's say that in the outer DFS loop, let's say we start somewhere like the vertex v. So notice in this outer for loop, we wind up considering the vertices in a totally arbitrary order. So let's say we first call DFS from this vertex v. So what happens, well, the only place you can go from v is to t, and then at t, there's nowhere to go. So we recursively call DFS at t, there's no edges to go through, we finish the for loop, and so t is going to be assigned an f value equal to the current label, which is n, and here, n is the number of vertices, which is 4. So f(t) is going to get, so our t is going to get the assignment, the label 4. So then now we're done with t, we backtrack back to v. We decrement the current label as we finish up with t, we get to v, and now there's no more outgoing arcs to explore, so for loops finish. So we're done with it in depth-first search. So it gets what's the new current label, which is now 3, and again, having finished with v, we decrement the current label, which is now down to 2. So now we go back to the outer for loop, maybe the next vertex we consider is the vertex t. But we've already been there, so we don't bother to DFS on t. And then maybe after that, we try it on s. So maybe s is the third vertex that the for loop considers. We haven't seen s yet, so we invoke DFS, starting from the vertex s. From s, there's two arcs to explore, the one with v, v we've already seen, so nothing's going to happen with the arc sv. But on the other hand, the arc sW will cause us to recursively call DFS on w. From w, we try to look at the arc from w to t, but we've already been to t, so we don't do anything. That finishes up with w, so depth-first search then finishes up at the vertex w, w gets the assignment of the current label. So f(w) = 2. We decrement current label, now its value is 1. Now we backtrack to s, so we've already considered all of s's outgoing arcs, so we're done with s. It gets the current label, which is 1. And this is indeed one of the two topological orderings of this graph that we exhibited a couple slides ago. So that's the full description of the algorithm and how it works in a concrete example. Let's just discuss what are its key properties, its running time and its correctness. So, as far as the running time of this algorithm the running time is linear. It's exactly what you'd want it to be. And the reason the running time is linear is for the usual reasons that these search algorithms run in linear time. You're explicitly keeping track of which nodes you've been to so that you don't visit them twice, so you only do a constant amount of work for each of the n nodes. And each edge, in a directed graph, you actually only look at each edge once, when you visit the tail of that edge. So you only do a constant amount of work per edge as well. Of course, the other key property is correctness. That is, we need to show that you are guaranteed to get a topological ordering. So what does that mean? That means every arc travels forward in the ordering. So if (u,v) is an edge, then f(u), the label assigned to u in this algorithm is less than the label assigned to v. The proof of correctness splits into two cases, depending on which of the vertices u or v is visited first by depth-first search. Because of our for loop, which iterates over all of the vertices of the graph g, depth-first search is going to be invoked exactly once from each of the vertices. Either u or v could be first, both are possible. So first let's assume that u was visited by DFS before v, so then what happens? Well, remember what depth first search does, when you invoke it from a node, it's going to find everything findable from that node. So if u is visited before v, that means v isn't getting explored, so it's a candidate for being discovered. Moreover, there's a an arc straight from u to v, so certainly DFS invoked at u is going to discover v. Furthermore, the recursive call corresponding to the node v is going to finish, it's going to get popped off the program stack before that of u. The easiest way to see this is just to think about the recursive structure of depth-first search. So when you call depth-first search from u, that recursive call, that's going to make further recursive calls to all of the relevant neighbors including v, and u's call is not going to get popped off the stack until v's does beforehand. That's because of the last in, first out nature of a stack or of a recursive algorithm. So because v's recursive call finishes before that of u, that means it will be assigned a larger label than u. Remember, the labels keep decreasing as more and more recursive calls get popped off the stack. So that's exactly what we wanted. Now, what's up in the second case, case two? So this is where v is visited before u. And here's where we use the fact that the graph has no cycles. So there's a direct arc from u to v. That means there cannot be any directed path from v all the way back to u. That would create a directed cycle. Therefore, DFS invoked from v is not going to discover u. There's no directed path from v to u, again, if there was, there'd be a directed cycle. So it doesn't find u at all. So the recursive call of v again is going to get popped before u's is even pushed onto the stack. So we're totally done with v before we even start to consider u. So therefore, for the same reasons, since v's recursive call finishes first, its label is going to be larger, which is exactly what we wanted to prove. So that concludes the first quite interesting application of depth-first search. In the next video, we'll look at an even more interesting one which computes the strongly connected components of a directed graph. This time, we can't do it in one depth-first search, we'll need two.