So we start here with a bit of a less intuitive distance metric, namely the cosine distance. So we're going to start off again with two points in two dimensional space just to highlight our example. And hopefully from the lines that we just drew, it should be clear that this is already shaping out to be much different than the L1 and L2 metrics that we just discussed. What we really care about with the cosine distance is the angle between these two points. This metric gives us the cosine of the angle between these two vectors defined by each of these two points. Which in order to move up to higher dimensions, this formula will still hold of taking that dot product as you see in the numerator over the norm of each point in the denominator. And the key to the cosine distance is that it will remain insensitive to the scaling with respect to the origin. That is we can move one of those points as we have her, along that same line, and that distance will remain the same. So any two points on that same ray passing through the origin will have a distance of 0 from one another. And the idea is that we want to see the relationships here between regency visits between one point and the other, much more so than we care about the actual physical distance between the two. So recency being one and visits being one is equal to the, regardless of the cosine distance and how far away it is recency being 10 and visits being 10. Visit will be along that same ray. So for two vectors that are pointing in the same direction, our cosine distance will spit out 0. They'll think of them as very close or essentially exactly the same. But for Euclidean distance, it may think of them as very far apart, depending on where those values actually lie, even if they are the same line. So, how is this useful? Being able to classify them is exactly the same, if they're pointing in the same direction. Let's say we have text data, and our features are going to be different counts of different words within the documents. Now, just because one document is longer than the other, so it has more counts of each of these words, does not mean that they need to be far away from one another, and thus cluster differently. Maybe they're about the exact same thing. Maybe one of those articles is a summary of the other. In that case, you want to mark them as close to one another and cosine distance will come in handy in that situation. So if you have 3 counts of the word data science and 10 counts of the word application. And then you had 30 of data science and 100 of application. Then you probably want to assume that those are along the same category and cluster those together even though their Euclidean distance may be far apart. Their cosine distance there would have been in the exact same direction and that's 0. Another advantage of the cosine distance is that it's more robust against this curse of dimensionality. Euclidean distance can get affected and lose meaning if we have a lot of features as we saw in our initial discussion of that curse of dimensionality. So our takeaway here is that the best choice of distance is going to heavily depend on what our application is. Another distance metric to keep in mind is going to be the Jaccard Distance which will be useful for text as well. And it applies to sets. And an example of this is use pretty often, will be something that we walk through here, which is the word occurrence, the unique word occurrence. So say we have a sentence A, I like chocolate ice cream. That set of A is just going to be the unique words in that sentence. I like chocolate, ice and cream. Say sentence B is going to be do I want chocolate cream or vanilla cream. So set B is going to be do, I, want, chocolate, cream, or, and vanilla. Again not counting that second cream only those unique values. And then the Jaccard Distance is going to be 1- the amount of value shared, so the intersection over that union. So the shared values, of the two sentences, over the length of the total unique values between those two sentences. And we'll see this example in just a second and the calculation as well. And it can be used as a different option when we have these text documents to group similar topics together. So using this example, we can calculate the score between our two sentences. And running through it, we see that our intersection is going to end up having three words and there are nine unique words total. So the distance is going to be 1- 1 /3 equals 2/3 or 0.67. And that will be our distance. So that closes out our different distance metrics and overall in this discussion just to recap, we discussed the importance of having different measures of distance between our two points, as well as the applications of distance measures to clustering. And how the measures of distance or similarity will ultimately have a large effect on the groupings that we end up creating. And with that we discussed the Euclidean as our most common metric where we use our old math that we learned from back in the day have a squared plus b squared equals c squared. We discussed the Manhattan distance, which was the absolute value of each distances, individual features all added together. We discussed the cosine similarity, which highlighted the angle between our points. And then finally, we discussed the Jacquard distance, which was useful in showing the difference and similarities for different sets of values. All right, I'll see you in the next video.