The beta distribution is a distribution over a probability, therefore, the range of values it takes as input is from 0 to 1. The beta distribution also has the property that it is the conjugate prior of a binomial distribution. A class of conjugate priors for a sampling model and by probability of y given theta, is one that makes the posterior probability of theta given y have the same form as the prior. The PDF for the beta distribution is shown here. What this means is that, if we have beta prior and a likelihood that has a binomial form, the posterior will also have a beta distribution. Not only do we know the form, we can compute the posterior exactly in closed form. So what does that look like in notation? So the prior for theta is now a beta distribution, or theta can be drawn from a beta distribution, parameter tries by a and b. Now a sampling distribution is also a binomial distribution, which is parameter tries by num underscore p and total, where num underscore p corresponds to the number of successes and total corresponds to the total number of events. Now the posterior can be computed as another beta distribution, which is now parameterized by num underscore p plus a, and total minus num underscore p plus b. Observed theta y is represented by num underscore p and total, which is the number of successes and the total number of events. The parameter theta is also associated with the success event represented by num underscore p. As already mentioned, num underscore P is a number of positive events and total as the total number of events we have. Here, a and b can be considered pseudo counts, that is, the parameters to the prior beta distribution. If we set a and b to 1, we get a uniform distribution. The mean and variance for the beta distribution given by a and b is given by the terms here, mean equals a over (a + b), and variance equals a times b over (a + b) squared times (a + b + 1). So let's look at an example for data that can be modelled using a beta distribution. The beta distribution can be used as prior distribution as shown above for modeling the batting average given by theta in baseball. For a new player, since we have no information, we can rely on historical batting averages to form a prior. Since batting averages are known to be in a certain range, we can use that information to define a beta prior for theta, the batting average for a new player. The number of hits and misses can then be represented by a binomial distribution, that is, the number of successes and failures. After every game, the posterior for theta can be computed from these values. The posterior value for theta can now be used as the prior for the next game. Keep in mind here that the beta distribution takes positive values between 0 and 1 as input. And also, if you set the a and b values to 1, we get a uniform distribution. In this function, we're actually going to plot the posterior distribution for beta distribution. So given a prior beta distribution and the likelihood that is a binomial distribution, we'll go out and compute what that theta posterior looks like. So this function now takes as input, binomial parameters, that is, num underscore p or the number of positive events, and total, which is the total number of events. It also takes the beta prior distribution parameters a and b, and then it goes ahead and computes the parameters for the posterior beta distribution and plot it. So you can see here, as I increase the value for num underscore p, or the number of successful events, the estimation for theta, or the posterior distribution for theta now shifts to the right. Which is understandable, because now it implies that the theta parameter, or the success event parameter, is higher. Similarly, if we increase the number of total events, it shifts to the left, because the theta parameter, or the success parameter, as the ratio of the total number of successful events to the total events, is smaller.