Okay. So let's change gears here and start talking about implementation of different types of coherence and cache clearance systems for multiprocessor systems. So, the first thing we're going to start off with is small symmetric multiprocessors. Now, why do I call these things symmetric multiprocessors? Well, in a symmetric multiprocessor, everything is the same distance away from memory. So, we have processors across the top here. They have a shared CPU memory bus here, and memory is sitting over here. And these processors are all equally distanced away from this memory. And this shared memory bus here also goes and communicates with the I/O bus, where you have things like discs, graphics controllers, networking, and any processor can do any I/O. And any processor can communicate with memory. And they're, they're symmetric. Now, let's zoom in on what this bus looks like here. because it's going to actually influence our design. And I want to point out that buses are only one design that you could come up with for a multiprocessor system. You could also think about having point to point interconnect. So what I mean by that is one processor connects to another processor directly. But then a third processor connects to the first processor but not, not, not vice versa. So you could have some sort of routing needed. And this is what you'll see when we start to talk about large multi-cores or large mutiprocessor systems. But for today, we're going to constrain ourselves to thinking about small symmetric multi-cores, where all of the processors are equidistant away from memory, and they sit on a shared bus. So let's take a loot at what a shared bus looks like. So, here we have a diagram representing a multi-drop memory bus. And let's start off by looking at all of the different signal types that you need in this multi-trop, multi-drop memory bus. And before we do that, let's describe what multi-drop means. So, multi-drop just means that it's a shared medium, it's a shared wire that all of the processors. So here we have processor one, processor two, and main memory connect into this bus. So it's just a wire and then you have taps coming off the wires. And this is why we call it a multi-drop bus. And when you go to look at this, there's some sort of positives and negatives in multi-drop bus. The positive here is that you don't have to route. If you wanted to have one processor communicate with main memory, or read main memory, it can just shout saying where is address five. And main memory can just to say, I have that here in terms of data. But the downside to this is cluster one and cluster two can't go shout at the same time. So, as we start to add more processors, something like a shared multi-drop bus might become a problem. And we're going to talk about that once we start to get to large multiprocessor systems or large parallel systems at the end of this class. But let's for now let's, let's focus on multi-drop memory buses. And let's look at the all the different wires you're going to need here. So, we'll start from the bottom here. So in the bottom, we just have a clock. And this is basically driven externally. You don't need any processor-1, processor-2. Or main memory is not going to be driving this. This is just something they all receive to keep everybody synchronized. Now, let's start at the top here. Arbitration. What does arbitration mean? Well, arbitration means you need some way to determine who is allowed to shout, or who is allowed to utilize the bus at a given time. So, these sets of wires are going to be used to have one of the three things for instance on this bus determine who is allowed to use the bus or shout on the bus at any given time. And how do we go about doing this? Well, there's a couple different ways you can go build arbitration logic. one way is you could actually have what's known as a pull-down bus. So, let's say you have a wire per processor, or wire per entity that wants to communicate on this bus. And when you want to go use it, you pull down a wire and this inflicts priority. If you see that processor 1 is pulling down the a wire, and processor 2 is also pulling it down. One always wins. But usually that's not the best thing to do, because then you might have, you're required to basically have some sort of fixed priority. Instead, you could think about having a request and grant system. So in the request and grant system, let's say you have a chip which is an arbitrator. And you have let's say three entities on here that have three request signals: REQ 1, REQ2, and REQ3. This arbitrator can try to do something like a, round robin scheme, or try to influence some sort of fairness. And what would happen is at the beginning of a memory bus cycle. You'll actually have, whoever needs to use the bus that cycle will all let's say, assert their wire. Assert their request wire. And then arbitrator will take it all in, and take it all into consideration. Might have some state inside of here. And then it will tell only one of the entities on this bus with a grant signal. Three grant signals here, it will only assert one of these grant signals and make a decision and say, you know processor one wins, or processor two wins depending on which wire here gets asserted. So, multiple people can request but only one wins. So the first thing you're going to want to do to try to use this bus, is you're actually try to arbitrate for the bus. And there's a set of wires for that. Okay, what happens next? Well, on the control wires, you're going to say what you want to achieve. So you might have a request that says, I'm want to do a read on the bus. Now we haven't yet said where we want to do a read of. Because, if you look at this multi-drop bus, we have wires for that. We have an address bus. So you first will say, I want to do a read and I want to do a read of address five, we'll say. Then in a traditional multi-drop bus you'll actually wait. So you'll assert the arbitration, the control of the address, and you'll be waiting around. You'll say, I want to agree to address five. And then main memory will say I have a dress five and it will assert onto the data bus here, we'll say, the data for address five. And then processor 1 can read in that data then. Now, what, what's some downsides of doing something like this. Well, as you go to build this multi-drop bus, you're basically going to reserve the bus the entire time that you are doing one memory transaction. And you need to hold the bus the whole time while you do the arbitration control, address data and data come back. And it could be a long time, because main memory can take a long time to return data. And, and this, this is a problem, so what did people think about doing? Well, they applied ideas from processor design and said, maybe we can try to pipeline the bus. So note, and let's flip back and forth here for a second. The title of the slide changes, but the content, the content doesn't. So this pipeline bus actually looks the same. So it has the same data, but now instead of arbitrating and winning the entire bus, and holding the entire bus for a long period of time. Instead, we subdivide all these different categories and actually pipeline the access to them and use them only when they're needed. So, we can actually take a look at this as a picture here. And we can see, for instance, on a pipelined bus, you might first let's say processor 1 is trying to do something. It'll assert processor 1 onto the arbitration lines and let's say it wins. And then in the next cycle, it'll assert that it wants to do a load. And then in the next cycle, it asserts that the address. And finally, let's say the main memory returns the data quickly here and returns the data over here very quickly. Now, why is this good? Well, because it's pipelined. The next cycle here someone else can be arbitrating for the bus. The cycle after the load or the control data signals are used here, someone else can be putting a different transaction on. Likewise in the address here, the next cycle someone can be putting something here and data can be coming the next cycle. So you can basically not have to hold all the wires for the whole time of one memory transaction, but instead you can pipeline those transactions. And this is just to give you an idea of how the physical implementation of the wiring of small symmetric multiprocessors work. In reality, they're a little more complex. So something, we're not going to talk about [INAUDIBLE] [INAUDIBLE] what you'll see people do when they go build these pipeline buses, is they'll actually do what is called a split phase transaction bus. Where instead of let's say waiting for the data to come back. For instance in this example here it's very possible that the data from main memory might take a couple cycles to come back. Instead of just waiting there which would slow down your pipeline, if you have to stall for instance. Instead of doing that, you can issue a request and then some time in the future, the main memory might arbitrate for the bus again and the return the data. So that's why it's called a split phase transaction, so it's multiple phases to one transaction. So the first phase might be request the data, where you might have to use all of the portions of the bus. And then the response for the, the data will be the main memory arbitrating for the bus, saying that it's going to do a data response, and reasserting the address and then giving the data back. So you can see that that's a a better way to use the bus, because you don't have to hold the bus for a long period of time. So, one of the challenges this is that you still have everybody trying to scream on the bus at the same time. And if you were to take everyone in this room and try to scream all at the same time, we would not be able to understand what, what each other is, is saying. So, that's why we need arbitration here is to if you will to sort of house around the token so only one person can speak at a time. But if you want to have multiple people speaking at a time, we're going to have to look at more complex systems and we're going to talk about that in two lectures.