So in this section now let's look at some real data examples where we look at the results of estimated simple Cox regression models when we have a binary, or categorical, predictor. So after viewing this section, hopefully you'll continue and build your understanding of how Cox regression relates a function of the hazard or risk of a binary outcome occurring over time to a predictor via a linear equation, and that function, of course, is the natural log. You'll be able to interpret the resulting intercept, which is a function of time, and slope or slopes from a Cox regression model in which the predictor of interest is binary or categorical. And reinforce the interpretation of slopes from simple Cox regression models as the log of hazard ratios, and their exponentiated version as hazard ratio estimates. So let's go back to a study we certainly looked at multiple times in the first term of this course, the seminal study on Primary Biliary Cirrhosis done at the Mayo Clinic that had a 10 year study period, it began on January 1, 1974 and patient accrual was terminated in December 1983. And during that 10-year period, 422 patients with primary biliary cirrhosis To satisfy the entry criteria and 309 entered the trial. They were randomized to either receive a drug, d'penicillamine, or DPCA, or a placebo. And the outcome of interest, they were followed from the time of randomization to death or a censoring. So the follow up period was up to twelve years. Patients were followed for up to twelve years after the randomization, the longest patients were followed for 12 years until death or censoring. So here's the Kaplan Meier curve of the time to death or tracking the proportion who had not yet had the event, portion who had not yet died as a function of follow-up time in both the placebo in DPCA groups. You could see that we'd like these curves to be higher. The higher they are the better because there's fewer proportion that died by given follow up time and then it cross over a fair amount but it looks like the placebo does slightly better, has slightly better survival than the drug group here. So there is very little visual evidence, though, to indicate that the drug and placebo groups have distinctly different survival trajectories. So, a quantitative measure of the association with confidence limits will add depth to this analysis. And, in the previous term we showed that we could look at the data and estimate an incidence rate ratio and a confidence interval if we had the number of events in each group and the cumulative follow up time. Well, we can also use cox regression given the raw data to estimate and has a ratio or an instance ratio, and ultimately its confidence limits, and we'll get to confidence limits in another section further in this lecture set. So the Cox regression model that we would fit these data would look something like this. The log hazard of death at time t, we could write that formulaic as the log of lambda evaluated for a group given this x1 value at a given time, t, is equal to the log of an intercept, the log of the risk at time, t, the base line risk plus a slope times the value of x1. We only have two groups here, the treatment group, the DPCA group and the placebo group so we can create x1 that was equal to 1 if the patient was randomized the drug group and a 0 if the patient was randomized the placebo. So, we only have two different groups we're estimating the log hazard for, so, for the drug group with x1=1, their log hazard at any given time In the follow up period is equal to log of this baseline or intercept risk evaluated at the specific time plus the slope of x1 times 1 because those in the drugs recovered x1 value of 1. So we would evaluate this at a given time by taking the log of the risk function evaluated at time t, the baseline risk function plus the slope beta 1 hat. For any given time at the same time for the placebo group, the estimate is just a log of the intercept evaluated at that time. So as we've seen in all other regression examples, this slope is just the estimated difference in the left-hand side for two values of x that differ by one unit. We only have two possible values of x here, 1 for those in the drug group, 0 for those in the placebo group. So, the slope beta 1 is the difference in the log hazard, a specific value in the follow up period for those on the drug minus the log hazard for those on the placebo at the same time in the follow up period. So again, difference in logs should scream out log of ratio, so this could be expressed as the log of the ratio of the hazard of death for DPCA at a specific time in the follow up period, relative to hazard of death for placebo at the same time, and this hazard is constant regardless of the time the groups are being compared. So if we were to exponentiate this slope, we'd get the estimated hazard ratio of death in the follow up period for those in the drug group compared to those in the placebo group. So the results are in and here's what we get for this analysis, the resulting Cox regression looks like this. The log of the hazard given an x1 value at a specific time, t, is again, a function of this intercept, which involves a log plus 0.057 times x1. So here the slope is 0.057. If we exponentiate this, we get the estimated hazard ratio of death for those in the drug group compared to those in the placebo group at any given time in the follow up period, and then it's estimated constant hazard ratio over time is 1.06. So at any given point in time in the follow up period, those who receive the drug have a 6% greater risk or hazard of dying than those who got the placebo. So what is this intercept, the log of this baseline risk as a function of time? Well, what this would do is quantify over time, and we could extract this from the raw data, but we'll just talk about it generically here. If this is time, what this function gives us is the estimated log of the hazard of depth as the function of time for those in the group with x1=0. And this slope of beta 1 equals 0.057 is the difference in this log hazard over time for those who have an x1 value of 1 versus 0. So this difference regardless of where we're looking at on the time frame, this constant difference estimated is 0.057. So, again, the intercept then is that time piece, the variation in the baseline hazard over time on the log scale, so for the group that receives the placebo. So could another example, infant mortality and prenatal vitamins, we also looked at this extensively in the first term. This is from the Nepali maternal-infant child mortality vitamin study, where women who were pregnant, were randomized to receive either Vitamin A, placebo or beta carotene in the third trimester of pregnancy and we were looking at infant mortality outcomes in the first six months after birth. And here are the survival curves for children born to the three groups of mothers, and this has been re-scaled so the axis only runs from 90% to 100, and these curves are still tightly wound so there's very little visual evidence of much of a difference in the survival outcomes between these three groups of children, and you may recall from the first term there was very little numerical evidence as well. Lets try and get some numerical evidence before we had estimated the incident rate ratios where we had the total counts, number of deaths in each of the three vitamin groups and the total follow up term for infants in each of those groups. Let's use the raw data in the computer to estimate this as a Cox regression model. We've got three groups nominal category so the drill will be the same as always, we will designate one as the reference group and then make indicators for membership in the other respective groups. So I'm going to designate the placebo group here as the reference group. Make that with x1 = 0 and x2 = 0. And then those were going to be the indicators for beta carotene and Vitamin A respectively. So those in the placebo group will have both values equal to 0, those in the beta carotene group will have an x1 value of 1 and an x2 value of 0, and those in the Vitamin A group will have an x1 value of 0 and an x2 value of 1. So the estimated log hazard at any given time in the follow up period is follows for each of those three vitamin groups. For the reference group, it's simply equal to that intercept that varies over time. To get the estimated log hazard at any given time, the follow up period for those whose mothers were given beta carotene, we would take the starting value that intercept evaluated at the time t, that would be the value at that time for the placebo group, and then add beta 1 hat, which is the difference in the log hazard between the beta carotene and the placebo group at any given time in the follow up period. We need to do the same for Vitamin A group, we start with that same intercept of the value in the placebo group, evaluated at that given follow up period, and add beta 2 hat. So beta 2 hat is the difference in the log hazard between those in the Vitamin A group and those in the placebo group at any given time in the follow up period. So each of these slopes are estimated hazard ratios, so when we run the numbers, beta 1 equals 0.002, and if we exponentiate that, we get a hazard ratio that compares that relative hazard of mortality at any point in the follow-up period, the six month follow-up period for children born to mothers who got beta carotene during their pregnancy compared to children born in mothers who got the placebo. And this is an estimated hazard ratio of 0.02. So these children have a slightly higher risk of dying in the follow up period compared to those who got no treatment. And beta 2 equals 0.006, if we exponentiate that we get the hazard rate of death for the Vitamin A group compared to the placebo group, again, at any time in the follow up period and that's equal to 1.06. So again, a slightly higher risk in the placebo group for those who were given Vitamin A. So what about this underlying log lambda not hat of t? What does that quantify? Well, what that quantifies is again, trying to put three dimensions into two here. This is time and then this is the log hazard of mortality. This is over a six month period, or 180 days. The function there is a baseline function estimated for the placebo group, when x1 = 0, and then at any given point in the follow-up period to get the respective log hazard for the beta carotene group, and I'm not going to draw the scale, we would start with whatever the estimated log has or it was for the placebo group and add 0.02 to it. So the difference in these log hazards at any given point in the follow up period is 0.02. If we were to do the same thins, and I'll try to draw this to some scale here, looking at the log hazard for the Vitamin A group over time has exactly the same shape, but the difference between that group and the same reference to the placebo is the beta 2 hat of 0.06. So, on the log hazard scale the difference in the log hazards between the Vitamin A group and the placebo group at any point in the follow up period is the difference in these two betas, 4.04. Let's look at one more example. A 2011 article presents the results of a randomized trial of a home based intervention on early feeding practices, an attempt to increase at a likelihood of successful breastfeeding. So as per the authors, this intervention consisted of five or six home visits from a specially trained research nurse delivering a staged home based intervention in the anti-natal period at 1, 3, 5, 9 and 12 months respectively. The outcome y=1 in this case is stopping breastfeeding. So this is for a cohort of women who were initially breastfeeding upon the birth of their child and the hopes of the intervention were to increase the retention on breastfeeding over a 60 week follow-up period, it's a little over a year. So these are the Kaplan Meier curves of the proportion who are still breastfeeding, have not stopped breastfeeding because the event is stopping breastfeeding, in the group of mothers given the intervention that's the curve on top here versus the group of mothers given control or randomized to the control group. You can see the proportion who are continuing to breastfeed, who have not yet stopped, is higher in the group that got the intervention, but we'd like to be able to quantify that beyond the descriptive Kaplan Meier curves here. So the results from the Cox regression as per the authors were as follows, compared with the control group, the hazard ratio for stopping breast feeding in intervention group was 0.82. So the underlying Cox model that produced this result was of the following, log hazard of stopping breastfeeding given group membership at any point in the followup period was equal to a baseline, of course hazard or the slope evaluated at the specific time t, baseline hazard intercept, the log of lambda hat not of t evaluated at that specific time t plus the slope of -0.20 times x1 which indicated whether we were looking at the intervention group and it was equal to 1 or 0 for the placebo. This shows us that off hand, those who received the intention had a lower log hazard of stopping breastfeeding and has a lower hazard and the resulting hazard ratio estimate, that's what we get when we exponentiate that slope 0.82. So that's the hazard ratio stopping breastfeeding for the intervention compared to the control, 18% lower hazard at any given point in the follow up period. So, in summary, the slopes from Cox regression with the binary or categorical predictor compare the log hazard of the time to event outcome. So, to summarize the slopes from Cox regression models with a binary or categorical predictor compare the ln(hazard) of the time-to-event outcome between two groups at the same time in the follow-up period, Regardless of what that time of comparison is. These slopes could be exponentiated to get the estimated hazard ratio for the two groups being compared by the slope. The intercept, a log of lambda hat not of t, tracks the natural log of the baseline hazard in the reference group as a function of time. The natural log of the baseline hazard in the reference group is a function of time where the reference group is the group where all x values are equal to 0. In the next section, we'll just look at some examples of where our predictor is continuous, but the general results and interpretations will be the same.