OK. So let's take a look at the problems with having a bus and actually, the bigger problems with having a cache. So here we show a little bit of an example here which has two CPUs. And two caches. And we have mean memory. And then, everything sits on a bus. Now, why do we want a cache? Well, we've discussed this in great detail in this class that, caches make your program go faster. because you don't have to go communicate with the distant memory. To go access some piece of data that you've accessed either temporarily. Or temporarily, recently or has spatial locality with some other reference that you've done recently. So, let's, let's take a look at what, what happens in this basic, basic example. So, let's say, at address A here, we start off with everyone having the value of 100 in their respective caches and main memory. So address A has value 100. Now, let's say, let's suppose, that CPU-1 updates address A to value 200. Okay? So, let's look at this in, in two cases. The first is in the write-back case, so all the caches here are write-back. So CPU-1 Updates this to value 200. But this is a write-back cache. Okay, so what happens to the values in memory and in cache-2? Hmm. Well, all of a sudden we have a stale value. The newest values here, is going to be 200, but here in memory and here in the cache 2, we have the old value. So, if CPU-2 tries to go read address A, it's going to continually just get the old value. Likewise. Main memory has an out of date value. Now where does this become a problem? Well CPU-2 never sees the value that got updated for address A. More to the point, what happens if CPU-2 tries to go update cache-2 here with a value 300, we'll say. Well now we got a big problem. Now we have three different values and this is going to cause problems for both our memory consistency model. But also just to use this system, you know, how do you know when the value has been updated and is shown. And we can see that it is because we have, caches here that this problem exists. And it's because we have two caches that we get stale values either in other caches or in main memory. okay, so question comes up. Maybe this is just a problem with write-back. Write-back caches, yeah they're optimization. You know, they, they break only when they need and they can store dirty values inside themselves. But maybe we should be using write-through caches instead. because then, at least everything everything goes out to main memory. So, let's take a look at the same example here. So reset, we have 100, 100 and 100. Now CPU-1 goes and does a write to address A with value 200. But now it's at least write-through. Okay, so that writes 200 here and 200 here. So the question comes up, does cache-2 see this and CPU-2 see this update. No. We have no mechanism to do this. And this is going to motivate why we want to build cache coherence protocols. So we do the update here we're going to have basically value 200, 200 here because we wrote through. But CPU-2 will never see that updated value because it already has address A with value 100 in its cache. so, two questions here, do these stale values matter? Yes, the stale values are definitely going to matter. So, what happens here? Well, you will never see the updates in the other CPU, and hence there's basically no way to communicate. Second question here. What is the view of shared memory for programming? Well, our question here is, you need to have some notion of when a store to a particular address or particular value shows up in another processor. And right now if we look at this case here there's no mechanism for the other processor to ever see that updated value. So even if CPU-2 does a million reads, it'll never get that updated value because it already has address A with value 100 in its cache. So it could it there and read, it could sit there and do a write that might update main memory but I will still never see the update from CPU-1. And that's problematic because how do you build programming models for that? And more to the point, this affects your consistency models, so let's take a look at write-back caches with sequential consistency. So this is using the same example we used from class last, last time. So just a recap, we have two threads that are sharing memory space and we're going to call one T1, the other T2. And the T1 stores 1 to X and 11 to Y. And concurrently exectuting T2 loads Y into a register and then stores that register out to Y prime. So a different memory address. And then it loads X into R2 and stores R2 into X prime, a different memory address them X different memory address than X. That's correct. So what's going on here? Well you notice that we, we purposely sort of made a tricky case here. We made Program 1 or Thread 1 write X and then Y. And then Thread 2 read Y and then write a different value, Y prime, and then read X and then write a different value, X prime. So we've basically purposely flipped those two values. And let's take a look what happens with a write-back cache. And signify the virtue of having a bus with a right back cache, violates sequential consistency. Okay, let's, we said all in sequential consistency, all execution orderings and interweavings where we did not order instructions, need to be valid. So we can choose a purposely complex or purposely hard case here. So, we're going to, we're going to basically do this. So we're going to have Thread 1, execute first. So to completion, so what's going to happen is we're going to have two caches, cache-1, cache-2 and main memory. And, we're going to look to see what's in these values, as, as time goes on here. And, time's going to move, down on this graph. So the first thing that happens is, memory x and memory y. Memory x has 0 and y has 10 in it. And, that's just the, the initial conditions of this problem. Okay. So, T1 gets executed. Well, so that writes 1 and 11 to cache-1. Now note, because this is a write-back cache, we have not updated memory here. So it has a stale value, and this is going to cause us problems in our consistency model. In our sequential consistence sequentially consistent consistency model. Okay, what happens next? Well, let's say X1, or excuse me, X and Y are in different cache lines. So they are able to write back independently. So, we're going to say that cache-1, writes back Y but not X. Mmm. You guys should start seeing if there's going to be a problem here. So, in main memory, we have Y has 11 and X has 0. Ooo. that's a, that's a sort of final case we don't want to see. But let's keep moving forward here. Okay, so then we run Thread 2, or Thread T2 here to completion. So it reads the values from main memory, brings it into its cache. Does the updates here, but it doesn't actually do it right back yet of these new values. So, then let's say, cache-1 writes back x, so now main memory says, 1 and 11. Hm. That, that's, that's not good. But at least that's sort of what program T1 would do if you normally executed. But, but, what's not good about this is, cache-2 here is not consistent with main memory. You can see that X has 1 here but this cache says that X has 0. And then finally cache-2 does a write-back of X prime and Y prime. So it's going to write back what the values it had which were 0 and 11. Now you may recall from the previous class that, this was a sequentially inconsistent memory output. This is never supposed, supposed to happen in a sequentially consistent system. So all of the sudden what we did is we took a, a memory system we introduced caches and by virtue of having these caches We've took a system which would potentially implement sequential consistency, and we broke it. We broke it because of the stale values in caches. So this is for the write-back case. So, write-back caches can basically cause sequentially inconsistent values. To end up in main memory when you're done and likewise in caches. Okay, let's walk through a case here, sort of similar thing but for write through caches, because write through caches we are to find are not particularly better either. But to get this case correct, we need to be a little bit more contrived. So, same program here: cache-1, memory, cache-2. And at the beginning of time, we're going to say that cache-2, in the value x here has 0 which is consistent with main memory. So some reason cache-2, let's say, read value X in a previous program a long time ago for some other reason. So cache-2 has X being 0. And what we're going to show is this x being zero in cache-2. He's going to cause some stale value to live on beyond the expected time. So, let's say Thread 1 executes. So Thread 1 executes and updates X with 1, Y with 11, and it's write through, so this gets pushed to main memory. So if two caches are on a bus and there is no communication happening on that bus. Modules are just sort of communicating with main memory. So the other cache doesn't react. Okay, that's, that's okay. Let's see what happens now. T2 executes, Thread 2 executes. Well, it reads from main memory, but you'll notice here that it doesn't have to read X 0, or X excuse me. Because X is already sitting in its cache, it has the stale value here. Hmm. Well why, why does it not have the need to do anything? Well it's because we haven't said that we need to actually change how a bus works. We just said you can go to read something, you can pull your cache. And if people pull things into their cache there's no way to ever take things out of a cache unless you, know, it falls out because of capacity issues or conflict issues. Well it goes and reads this and it does a write let's say to x prime and y prime, and here we get value 0 and 1 for X prime and Y prime. And this causes sequential consistency to break this is a non-sequentially consistent execution. So just because you have right through caches on the bus does not guarantee that you're going to have sequentially consistent execution. Hm. Okay, so now the question is, we, we spent all of last class talking about sequential consistency. What good is this for if, if our by by putting a cache into it, all of a sudden we break that whole model. What's...what's the solution to this? Well, we're going to introduce an idea here, called Cache Coherence and we're going to contrast that with Memory Consistency models. So, or be consis, memory consistent somehow. So, we're going to define a Cache Coherence protocol, as something that insures that, let's say, all rights by one processor are some point in the future eventually seen by another processor. And, we're going to say that a, you typically have some sort of cache coherence model, or cache coherence system, underlying system, which allows you to maintain some consistency model. So in effect here, what you're doing is cache coherence protocols are the underlying implementation which allows you to have these stronger guarantees. And the stronger guarantees are what makes the programmer's life easier as we discussed in last lecture. A programmer wants some sort of guarantee, and we discussed sequential consistency as one of the ways to go about doing that. Okay, so we're going to spend the whole rest of the class talking about how to go build a reasonable cache coherence protocol. And as I said, memory consistency models are just the rules that the cache coherence protocol, tries to, observe. And, an important thing here is, you can have different cache coherence protocols and different consistency models. So just because you have a cache coherent system, doesn't mean you have, for instance, sequential consistency. In fact sequential consistency is a very strict form of a consistency model. There are much looser models out there. In fact most processors go buy, commercial multi-processorers that you go buy, are not sequential. They do have cache coherence. But they, those cache coherence, implements something in, that is typically looser than sequential consistency. And people will actually define different consistency models. And we talked about a few of those things last time, like total store ordering. weak consistency models. Things, things like that. So it's a combination of the cache coherence protocol [COUGH] implementing a sequential consistency model which allows you to actually go build useful software systems on multi-processors.