In session 2, we learned how to set up and run a simulation. This time, we'll look closer at the simulation results. Since the simulation provides us with estimates of reward and risk measures, we need to understand how precise those estimates are, if we were to use simulation results for comparing alternative decisions. We'll set up a simulation with larger number of simulation runs and compare its results with those of a short simulation to see how the precision of simulation estimates changes with a number of simulation runs. Finally, we will build histograms of simulated random quantities to see what the simulated distributions of random inputs and random outputs to a simulation look like. Okay, in session 3 we will look at the interpretation and visualization of simulation results. In session 2, we have set up and run a simulation of monthly payment values under the new wireless data plan that our consultant is considering. Here are the results of this simulation, copied from the Excel file DataPlan10. We've used ten simulation runs in that simulation. In other words, we've generated ten random instances of monthly data usage taken from a normal distribution with a mean of 23 GBs and a standard deviation of 5 GBs. Those values are shown in green. For each of those data usage values, we have calculated the corresponding monthly payment value. So we have a sample of ten monthly payment values and they're shown in blue. Ultimately, we're interested in the distribution of, Monthly payment values. But first, let's look at the sample of data usage values. Why? Because we actually know the distribution they came from and we can compare the sample mean and the sample standard deviation for these ten values to the mean and the standard deviation of the distribution from which this sample was generated. Sample mean which is around 25 GBs is just the average of ten random data usage values that Excel generated in this particular simulation. It is of course just an approximation to the true expected value of the distribution from which these ten numbers were generated. That true expected value is 23 GBs. In the same way the simple standard deviation which is around 8 GBs is just the reflection of ten random data usage values that were generated in the simulation. It is an approximation to the true standard deviation of the random variable from which the ten numbers were generated, 5 GBs. So let's now look at the estimates of the reward and the risk that we get from the simulation. In other words let's look at the sample mean and the sample standard deviation of the ten monthly payment values generated by our simulation. The estimate for the reward measure is around $253, and for the risk measure is around $92. In this simulation, we only generated ten random data usage values. The more data usage values we generate, the closer will the resulting sample of the data usage values be to the true probability distribution. As a result, this sample mean and the sample standard deviation will also be closer to the true values. With this in mind, instead of using ten simulation runs, let's do 1000 simulation runs. And let's see how our estimates of the reward and the risk change. Let's open our Excel file DataPlan10 and modify it to create a simulation with 1000 runs. Let's set up a simulation of the monthly data usage and payment values under the new plan with 1,000 simulation runs. We start with the file DataPlan10 and modify it to increase the number of simulation runs. Let's delete the random numbers from cells D-D11. Remove sample means and standard deviation headers. And also remove formulas and values from cells E3 through E14. We'll keep the formula in E2 for the later use. So we're deleting all of these numbers here. We're deleting all of these numbers here. We're deleting all of these numbers here as well. And now we're ready to extend our simulation file. Let's increase the count of the simulation runs in column C up until the cell C1001. So we copy and paste the formula in the cell all the way to C1001. Let's make sure we got the numbers right. We go to the last number, yes, it is a count of 1000. Next, let's call Analysis ToolPak. We go to the Data tab, click on Data Analysis, select Random Number Generation, click OK. For Number of Variables we put 1, for Number of Random Numbers we now put 1000, Distribution is Normal. With parameters 23 and 5. Let's put 1 to 3 as our random seed, and as an output range let's select D2 again. Now we can click OK. And you could see that Excel has generated 1000 instances of the random data usage and put them in the cells D2 through D1001. We can go and check to D1001 to see that we have exactly 1000 samples. Let's change the font in cells D2, D1001, into green, and bold, to designate that here we have the random input values. Now we just need to copy and paste the formula in E2 into the cells E3 through E1001. Okay Before we calculate the average and the standard deviation of the simulated values. Let's hide the columns 11 through 1000 to make sure that the entire file looks compact. So we're going all the way here. And do Hide. Okay, now let's calculate the sample mean and the sample standard deviation values for both samples, for the data usage and for the payment values. Let's put the corresponding headers into the cells C1003 and C1004. Sample Mean and Simple Standard Deviation. Okay, let's calculate the sample mean and the sample standard deviation for the monthly usage data and put those values into the cells D1003 and D1004. So we do average of all the data usage values, D2 through D1001 and we do the standard deviation of the sample in D1004. So we have a STDEV of D2 through D1001. We can mark those two cells appropriately, and then we can just copy and paste those formulas into the cells for payment values. And change the color. Okay, we have our new reward and risk estimates for the new data plan based on the simulation with 1000 runs. The reward estimate is about $220, and the risk estimate is about $58. Here's the copy of the Excel file DataPlan1000. That reflects the results of the simulation with 1000 runs. Let's now compare the results of a simulation with 10 runs with those of a simulation with 1000 runs. Looking at the sample mean and the sample standard deviation for the data usage values. We see that the numbers for the loan stipulation, 1000 runs, 23.3 and 4.9 are pretty close to the true values 23 and 5. Just like the sample mean and the sample standard deviation of the data usage values generated by simulation with 1000 simulation runs are much closer to the true values, the sample mean and the sample standard deviation for the payment values from the same simulation are likely to be much in a colloquial sense closer to their respective true values. In other words, the longer is our simulation, the more simulation runs it contains, the more confident we are in the estimates we obtain. Now that we're comparing short simulations with long ones, it is a good time to return to the notion of the random seed. What value should we be using in a simulation? To get a handle on this question, I have run a short simulation, the one with 10 runs, with three different seed values, and recorded the resulting values for the sample mean and the sample standard deviation of the data usage values. Here's what I got using the seed 123. And this is what I got when I used the seed 1826, which is the year in which Joseph Wharton the founder of the Wharton School was born. And here's what the results were for the simulation with the seed 19104 which is the zip code where the Wharton School is located. As you could see, the simple mean values jump all over the place. From around 25 to around 19 and then back to around 25. The sample's standard deviation values are not stable either. They go from around 8 to around 5, to around 3. But now let's see what happens if we run a long simulation, a simulation with 1000 fronts with these 3 seed values. Here's what we get using the seed 123. And here's what we get for the seed 1826. Now, what about that seed 19104? Here are the results. As you can see, Sample Mean and Sample Standard Deviation values do not depend much on the seed values for this loan simulation. So, we should just run loan simulations and use whatever seed values we want. As long as the simulation has many simulation runs, the seed value does not matter much. Now that we understand that simulations convert probability distributions of input random variables into probability distributions of output random variables, can we go beyond looking at the sample mean and the sample standard deviation numbers? Can we look at the entire distributions? Well histograms of simulation results allow us to add a full picture of what happens with our random variables. In the data plan example, our random input is data usage, and the random output is the payment amount. Let's go back to our Excel file DataPlan1000 and build histograms of the data usage and of the payment values. We will use the Excel file DataPlan1000 as a starting point for building histograms of inputs and outputs of a simulation. This file contains the results of a simulation of the monthly data usage and the monthly payment values for the new wireless data plan. This simulation was conducted using 1000 simulation runs. The random sample of monthly data usage is located in the cells D2 through D1001 and the corresponding 1000 monthly payment values are located in the cells E2 through E1001. Just keep in mind that we hid rows 11 through 1000 in displaying this file to make sure that we have a compact picture of simulation results. First, we'll build a histogram for the simulated values of the monthly data usage. Histogram is a frequency chart that displays how many simulated values fall within a certain range. The central concept for any histogram is a so called bin. A bin is a range of values for a simulated random variable. In order to build a histogram of the monthly data usage, we need to provide Excel with a set of bins and then Excel will count and put on a chart the numbers of random usage values within each bin. How do we select a set of bins for a histogram of a random variable? Let us look at the monthly usage data as an example. Let's go to the cells D1005 and D1006, and calculate the smallest and the largest among the random monthly data usage values in the cells D2 through D1001. In particular, we put the formula MIN(D2:D1001) into the cell D1005. And the formula MAX(D2:D1001) into the cell D1006. Let's also add headers MIN and MAX, into the cells. C1500 and C1006. We see that the smallest random date usage value that Excel has generated for us, is about 9.1 GBs. And the largest is about 38.5 GBs. Clearly, if we want to generate a meaningful frequency chart, our bin should approximately cover the interval between 9.1 and 38.5 GBs. So let's choose the set of bits that starts with the one that covers the interval up to 10 GBs. Followed by the bin that covers the interval between 10 and 11 GBs, then the bin that covers then between 11 and 12 GBs and so on with the last bin covering the interval between 38 and 39 GBs. In Excel, the bins are defined by the largest value that falls within a particular bin. So that the bin that covers the interval up to 10 GBs is defined by the value 10 GBs. Such a bin would cover the values of data usage that are less than or equal to 10 GBs. In a similar way, the next bin defined by the value 11 GBs will cover the values of the data usage that are strictly greater than 10 GBs and now less than or equal to 11 gigabytes. The less than will be defined by the value 39 gigabytes, and will cover the value of data usage that are strictly greater than 38 gigabytes, and are less than or equal to 39 gigabytes. Let's create the set of bin values Before we pass it to excel. In the cell A 1008, Let's put in a header U bins to indicated that what set of bin values for the monthly data usage. Then in the cell A 1009, let's put the value 10 to designate the bin for the smallest value subdata usage generated in our simulation. Next, in the cells A1010 through A1038, let's put the defining numbers for the rest of the bins. Let's just check, 39 that's good. Okay, we're now ready to create a histogram of data usage values generated by our simulation. Let's go to data, data analysis, select histogram and click OK. Now, in the histogram dialogue box, we go first for the input range, and select the cells D2 through D1001 as our input values. Then in the bin range we select A1009 through A1038 to tell Excel which bins to use when calculating the frequencies for the histogram. Finally, in the output options we select the output range. Let's choose a cell, let's say B1008 right here. B1008 will be the top left corner of the table of the histogram frequencies that Excel will generate for us. Let's also check the chart output to make sure that Excel will also present the results in the chart format, and then let's click OK. As a result we get a table and a chart as the table's graphical representation. The table contains the frequency for each bin. Each frequency value indicates how many out of 1,000 simulated data usage numbers fall within a particular bin. For example, the frequency value 2 that corresponds to the bin label 10 indicates that exactly 2 out of 1,000 data usage values in the cells D2 through D1001 are less than or equal to 10 gigabytes. Another example, the frequency value 4 that corresponds to the bin label 12, indicates that exactly 4 out of 1,000 simulated data usage values are strictly greater than 11 gigabytes, and are less than or equal to 12 gigabytes. If you look at the bottom of the frequency table, you will see that the Excel has added another catch-all bin, labeled More that counts any simulated data usage values that are strictly greater than 39 gigabits. The frequency count for this bin is 0. This should not be surprising, remember the highest simulated data usage value was about 38.5 gigabytes. There's simply no values in the simulated sample that exceed 39 gigabytes. With respect to the chart, one can always make some minor cosmetic modification to satisfy our personal tastes. For example, we can change the color of the bars on our chart to green to reflect the fact that we are having a chart of the input values for our simulation. We can also use Format Data Series to change the gap between the bars. To make it look a little bit better. Okay. We can also use format axis and go to labels to make sure that we have a label for every bin. For example we can do it like this Okay. And also change the font here. To make sure that we see all the labels that can be remove legend, what can change the x's designation to data usage. U in gigabytes. You can do many other things to, like I said, to satisfy our tastes. Okay, we're done with the histogram of the simulated data usage values. Let's now move on to building a histogram for the simulated monthly payment values reported in cells E2 through E1001. As before let's first decide on the bins that we'll be using for our histogram. If we copy and paste the formulas for the smallest and the largest values from cells D1005 and D1006, I'm trying to find them right here, to the cells E1005 and E1006, like so. Now we're talking about the minimum and the maximum values of the monthly payment. So we see that all the thousand payment values coming out of our assimilation are between $160 and a little over $438. So if we select the set of bins that goes from 160 to say 440 in a steps of 10 to make sure that we do not have too many bins, we will be able to cover in our histogram all the monthly payment values we simulated. Okay, so let's go to the cell A1042 and add the header P Bins. After that, let's put 160 into the cell A1043, and let's keep adding ten To each preceding value until we reach 440, so let's go to So, 1,007 changes, see? 1,071. Okay, now we have our bins, let's call the Histogram tool. We do Data > Data Analysis > Histogram, click OK. Input Range, well, this time we're talking about the input range from E2 through E1001. The bin range, well, we have new bins. Here we have to go all the way here and put in the bins from A1043 to A1000 and 71. Output range, well, let's go and put our table next to the bins. So, let's put it here, B1042 and let's request the chart as well, and let's click OK. So, we have the table, the frequency table and the corresponding histogram chart. So, we can find the chart and apply the same improvements to this particular chart. This will do it real quick. Let's change the gap, as we did before. Something like this. Well, that's too much, maybe a little smaller. Okay, and let's also go here and change it to Payment, B, in dollars Let's make sure here that we have labels for every box, okay? So here's a copy of the histogram for the data usage values we've created We stored the result in a new file called data plan one thousand underscore histogram. And here's what we got for the payment values. Now the data usage values, as the histogram indicates as well, came from a normal distribution The distribution of the payment values, though, does not really look normal. One interesting thing about this histogram is that it indicates that in about 250 cases out of 1,000 we simulated, the monthly payment amount was exactly equal to $160. In other words, this histogram tells us that there's about 25% chance that under the new plan, the payment amount will be exactly $160. In many settings, simulation is the tool that must be used to gain insights like this into the nature of future outcomes. And histograms are useful visual complements to simulations. So let's look again at our roadmap. When we make decisions in high uncertainty settings we should first define the reward and the risk measures. In the data plan example the expected monthly payment was selected as the reward measure and the standard deviation of the monthly payments As the risk measure. Next, we should use simulation to obtain estimates for the reward and risk measures. Here are the estimates we have obtained from a simulation with 1000 simulation runs in the data plan example. Finally We can use optimization tool kit to choose the best alternative using for example, the reward is the objective and the risk as a constraint or constraints. You will see how this is done in week four. If you want to run large complex simulations in practice The existing number of commercial simulation software packages that can help you set up and run simulation as well as visualize its results. Here's something to a recently compiled list of commercial simulation software packages. Some software packages also have the capacity to figure historical data to probability distributions. To help you work through the task of coming with appropriate distributions for the simulation. The data plan simulation example look at is small. It has one random input and one random output. In practice simulation model can have many random input factors And many random outputs whose distributions are of interest to decision makers. In week two, when discussing the optimization toolkit, we mentioned a journal called Interfaces. This journal is also a great source of examples of successful use of the simulation toolkit. In numerous industries. Here, we mention two such examples. The first one describes how intel, a company that does not really need an introduction uses simulation in addition to a number of other analytics tools to support a process of purchasing its manufacturing equipment. The second article, Is about the implementation of simulation-based approach to making the best pharmacy inventory decisions at Kroger, the US largest grocery retailer, and the world's fifth largest retailer. As you can infer from this articles, analytics approach is in practice are rarely used in isolation. Most of the time That combined together like optimization and simulation to form an effective solution tool. We have reached the end of week three of operations analytics course. During this week we have looked at how to use simulation tool kit to evaluate and compare alternatives in settings where our managerial decisions result in a distribution of outcomes. Rather than one certain outcome. In particular, we use an approach for comparing decisions and uncertainty that involved identifying a reward and risk measures and used a simulation to obtain estimates of those measures. Similar to what we had prepared for you in week two, we have put together an optional video review session and two practice problems to help you master the simulation tool kit And get you ready for the homework questions. Please keep in mind that if you feel comfortable with the simulation concepts you have seen so far, you can skip the review session and try the practice problems. Just as in Week 2, the questions we're asking you on the practice problems for Week 3 are very similar to the question you will see on Week 3's homework. Now where do we go from here? Once we know how to estimate reward and risk, we can compare alternative decisions using for example reward as the objective function and a risk measure or measures as constraints. This opens up a way to combine an optimization and a simulation together in search for the best decision. This will be one of the topics of week four of our course, enjoy.