We now enter that part of the course where we're going to start talking about bias. No, I don't mean bias like treating someone differently because of their race, ethnicity, gender, or sexual orientation. I mean bias in the statistical sense, which is when something in the design of your study affects the results in a way that makes it difficult to know what the real results are supposed to be. We're going to start by talking about a form of bias called selection bias during this lecture. But briefly, let's just give you a sense of this idea of bias in general. I'm going to introduce you to bias, a systematic error which distorts study findings and how it differs from confounding. We'll explain what bias does to the interpretation of results of a study, and we are going to give you some specific examples. I'm going to start with one that actually just came across my television the other night. It was a commercial for automobile insurance company. They said that the average customer saves $500 by switching to Geico. I thought, that's pretty neat, and you might interpret that as meaning that on average, the Geico policy is $500 cheaper than the competitor policy. But that's not quite what they said. They said the average person who switches to Geico saves $500. Well, what's wrong with that analysis? What's wrong with that? Is that people don't switch unless they're going to pay less anyway. It may be that the average person who switches away from Geico also saves money. So that is a form of bias. In this case, it's a form of selection bias. We're selecting a population of people that is no longer representative of the real population we're interested in. So before we delve too deep into this, we need to understand the difference between bias and confounding. This often confuses people. Remember, confounding has a very specific definition. A confounder is a variable, a third thing that is linked to both the exposure and outcome and as such induces a spurious connection between them. We can do a lot with confounders. We can adjust for confounders, like we talked about in the prior lecture. Bias is a bit more insidious than that. The rule of thumb is that confounding can be adjusted for, however, in perfectly, assuming you measure the confounder, but bias is forever. It is part of the design of the study. If you take that Geico example, if you had that dataset sitting in front of you, there's no way to fix it because the only people who appear or people who switched to Geico. You can't get the results of people at large because it's just not there, that is built into the study. Bias arises whenever you treat the exposed and unexposed group differently, or the outcomes versus no outcome group differently. Now, what is the effect of bias overall? Well, it depends a little bit. We typically say that a bias that creeps into the study is going to bias our results towards the null or away from the null. What that means is we say, okay, well, we acknowledge that there's this bias in the study design, and oftentimes you'll see in the paper itself, the authors will write, oh, there's this limitation. We only included people who switched to Geico, and we admit that this has been a bias, but there was no way we could avoid it. We think this will bias our results towards the null, meaning that the results we're presenting to you in reality would be even more extreme, more impressive if we didn't have this bias built-in. We bias them towards the null effect. Don't take author's words for this. In about 50 percent of cases I've looked at, they're assessment that the bias that they acknowledged biases towards the null is incorrect and, in fact, the bias biases away from the null. So think through it yourself. What is actually happening here and what would the results be if that bias weren't present? So let's give some examples. Remember there are basically two broad types of bias in the world. The one we're talking about right now is called selection bias, which is when the population you're studying is the wrong population for some reason. You'll remember when we talked about generalizability that it's really important to recognize who's in the population. Selection bias arises when you're oftentimes differentially including people in your study for one reason or another. The other type of bias, the main class of bias is called information bias, which arises when you measure something improperly. We're going to talk about that in the next lecture. So talking about selection bias, it's the error that results from the ascertainment or participation of study subjects that is not reflective of the sampled population. You're interested in this particular population, but you're selecting in a biased, in an unfair way, you're selecting people out. So let's take a quick example. So I saw this poll on foxnews.com. It says, "Do you think Trump's plan to relinquish control of his business empire goes far enough?" It's like funny to read that now because this feels like it was 30 years ago that we were even talking about this. But back at the time, this poll was on foxnews.com and 29,000 people answered and 70 percent said, "Yes, he's doing fine. We're not worried about it." I'd be really interested to see what they think now. But we'll leave that for another day. What's the bias here? Well, this is a form of selection bias because we are selecting a very specific group of people to answer this survey. Not only is it people who happen to be browsing foxnews.com, but it's people that have the where with all to see an online survey and decide, "Yes, I want to participate." Their willingness to participate in the study is actually associated with their outcome because they feel strong enough to actually do that. That is the essential selection bias that makes this seem unbelievable. You might have seen that and thought, "Yeah, that doesn't make sense. That's an online poll. I just throw those things away. Well, the reason you throw those online pulls out is selection bias. So you've seen it all along. Now, this happens in the real medical literature to not just on websites. Here's a study published in very prestigious journal, Journal American Medical Association, looking at changes in insurance coverage after the institution of the Affordable Care Act, also known as Obamacare. If you look at the graph, you can see that dotted line, that vertical dotted line is when Obamacare was introduced. You can follow the line for uninsured status and see that it went down a bit. In fact, this was a survey of over 500,000 people in the United States. That's a pretty good sample size and get pretty accurate estimates from that, assuming there's no selection bias. But there's a problem here. If you read into the methods, you see it. They say, "The survey's primary limitation is its low response rate, between five and 10 percent, similar to other household telephone poles without financial incentives for participation." In other words, they called a bunch of people and only 5-10 percent gave them an answer to their survey and we're asked to believe that this 5-10 percent is reflective of the broader population somehow. Is that true? Well, probably not. This is a special group of people who are willing to answer a huge survey when a stranger calls them in their house. I probably wouldn't do that. You might not do that either. So when we interpret these results, we have to assume that these are biased. Which way are these people more or less likely to be uninsured? It can be hard to tell. You don't always know the direction of bias, but we know that bias is there. This was a study looking at the outcomes in breast cancer among insured versus uninsured women. What you can see in the top here is that survival was much better in women who had private insurance, but those who were uninsured or who had Medicaid had worse outcomes in terms of survival. So you might look at that and you could conclude, oh, well, private insurance is better and people live longer. But this is also a form of selection bias because who gets private insurance? It's people that have jobs. This is called the healthy worker effect. If you're healthy enough to work a full-time job and be entitled to health insurance, you probably don't have a lot of chronic conditions that are going to wear you down and that might lead to earlier death. So that's another form of selection bias that's going to bias in this case away from the null. It's going make people with private insurance look even better than they should because there's this selected population. So be very careful. Now, one of the concerns that we have is in terms of loss to follow-up. So let's say that you have a study where you are following people for a long period of time. In fact, here's a study where they took 122 children who were randomized at age three years old to this childhood intervention. It was like developmental support in preschool and all sorts of good things. They followed them out for 30 years and they looked at how their health status was at age 30, their blood pressure, their cholesterol, and stuff. Really cool study. Like, should we be investing in preschool for these three-year-olds? Will it pay dividends down the road? Very interesting question. But, and in fact, the findings were it worked, the kids who got randomized to the intervention had lower blood pressure, etc, when they were older. But the caveat was there was 30 percent loss to follow-up in the treatment group and 50 percent in the control group. Now, loss to follow-up happens. You're going to lose track of people over time. They move out of the city, you try to call them, the phone doesn't work anymore. How are you finding these people for 30 years? But loss to follow-up is a form of bias. It's a form of selection bias in fact, because the people that you can continue to follow are generally healthier than people that you lose track of. It's not random who you lose track of. Like the good citizens who put down roots are easier to keep track of than others. There's differential loss to follow-up in the study and how important is that? It is super important. We actually don't need to do the math ourselves because the statisticians have worked it out for us. But I'm going to blow up this chart here and walk you through it. On the x-axis, you see the overall attrition rate. That's the loss to follow up in a study. You can see it ranges from zero to they go up to about 65 percent. On the y-axis, you have the differential attrition rate, the attrition rate between the two arms of the study. You can see that's much narrower. They go from 0-11 percent. So if you look across the x-axis, you can see that as you lose more and more people over time in your study and you're trying to make inferences about the truth, you're at greater and greater risk of bias. Green is you're okay, not too much risk of bias, yellow is a moderate risk of bias, and red is basically throw the study out. There's way too much biased to have any idea what's happening here. The second thing to take home from this graph is that the loss to follow up overall is not nearly as important as the differential loss to follow-up. If you lose people in one arm of the study more than another, you're in big trouble because the chances are that the people who are staying in the study are healthier, and that's going to bias your results towards whichever arm you happen to follow people better in. As you can see from this graph, even if you have 78 percent difference in the loss to follow up between the two arms, you are in the range of an unacceptable amount of bias because of that loss to follow-up. In that study I just told you about the cute kids, the three-year-olds who got randomized to nursery school, the difference in follow-up was 20 percent off the scale here. So despite the fact that this is a cool study, it would be great to say that, if we give kids in nursery school, they would be healthier at age 30, the risk of bias is so high in the state that we can't say anything about that. We just don't know because of the differential loss to follow-up. The other thing, another way that selection bias creeps in is in the analysis itself, in how you follow people as part of a randomized trial. So one way to ensure that everyone's balanced coming into a study is you randomize. You're not really selecting like healthier people get treatment A and sicker people get treatment B if you randomize. But that's not good enough because you have to follow them appropriately too, to avoid further selection. So let me give you an example. Here's a study of sitagliptin, which was an anti-diabetes medication, which is an anti-diabetes medication. The question was, would this drug reduce heart attacks in people with diabetes? Pretty straightforward. If you look at the table, in the intention to treat column, you can see that there are about 7,300 people in each arm and around 11.5 percent in both arms of the study had this heart attack at the end. So that's fine. But right below that I gave you the per-protocol analysis. Now, what's that? Well, you randomize someone to a medication. Do they actually take it? You might imagine that some people take the first dose and it doesn't sit well with them and then they never take it again. Well, should you consider them as being part of your treatment group if they're not even taking the medication anymore. A lot of people might say, Oh, no, because they're not taking the medication. I wonder if the medication works. Be careful, you're introducing selection bias because in this case, your medication is selecting for this group that can handle it. They're selecting for the people that aren't going to have as many side effects, that are going to do okay and so that's a form of selection bias, it's going to bias your results. So a per-protocol analysis only looks at the people who are taking the drug for the entire duration of the protocol. You can see here in both the placebo and the treatment arm that the rate of cardiovascular events is lower, 9.6 percent, why? Well, because healthier people take their medicine. So we're selecting out these sick people who even on placebo, these sick people who take it and think they're getting side effects and just decide to stop. This is why the best practice to analyze a randomized trial is to use the intention to treat principle because it does not induce bias. What happens is once someone is randomized to the treatment arm, they're in the treatment arm, and that's how you analyze them, even if they stop taking the drug. Now you might say, ''Well, if everyone stops taking the drug because of horrible side effects, your results are basically going to look the same because even the treatment arm is basically the placebo arm because no one's taking any real medication no matter what.'' Isn't that biasing towards the null? I would argue, not really, because in the real world, that's what would happen, if you make a drug that has so many side effects that no one can take it, well, it's not going to work, and it's important to know that. This is the first lecture on selection bias. It can sneak into studies in very insidious ways. You have to think about this one? You have to ask yourself if you could be president. It's not always clear which direction this bias will move your results, sometimes towards the null, sometimes away from the null. It'll wrinkle your brain a little bit to figure it out. You can't always figure it out, you might just know that it's there. Randomization helps to prevent bias at enrollment, but it's the intention to treat principle that helps to prevent bias in follow-up. So both of these need to be present to have a study that is really high-quality and free of this type of bias. Next time, information bias.