In this section we'll delve into the realm of regression for time to event data where we actually have the individual event and censoring times, this is called Cox Proportional Hazards Regression and we'll start with the simple case where we only have one predictor. So, let's first give an overview f this regression, it's very similar conceptually to the other types we've done with one exception, the intercept here will no longer be a static number but will depend on another dimension of data than the dimension of time. So in this set of lectures we will develop a framework for simple Cox Proportional Hazards Regression again, a method for relating a timed event outcome when we know the event and censoring times to a single predictor that can be binary, continuous or categorical as we've seen with the other types of regression as well. The Cox part of this methods name is for its inventor, Sir David Cox. He is actually a knight in the British Roundtable and was knighted in 1985 in no small part due to the invention of this method which has changed the way we do Survival Analysis in modern public health, but for also other contributions to the Statistical Sciences as well. So Cox is a knight. It's every statisticians dream to be knighted. I'm not a British citizen so I won't happen but you imagine would it be like to be queried by Sir Elton John and Sir Mick Jagger about your contributions to statistics. He is still currently involved at this juncture in the application and creation of statistical methodologies. The proportional hazards part of the name refers to a major assumption of this regression model which will be explained throughout this set of lectures. So for Cox Proportional Hazards Regression, the resulting equation differs slightly from the previous regression method. Certainly has a different left hand side because that's defined the different types we've had, whether it be linear logistic or otherwise, but also here on the right hand side we're going to see something different than what we've seen in the previous types. So what Cox regression does is it models on the log scale again the natural log hazard of a binary outcome but where we also take into consideration the time of whether the outcome occurs or the person is censored, models this time to event outcome as a function of a predictor X1. So the equation looks like this we estimate the log hazard or risk of an outcome, the Y equaling one is a function of both. We use lambda to represent the hazard or risk and this is a function of both our predictor X1 and follow-up time because we have time to event outcomes, it's a function of an intercept. So this is perhaps the biggest difference between Cox regression and the other methods we've looked at. The intercept here is no longer a static piece, it's something that varies as a function of time. So this changes depending on the follow-up time, but once we have this starting point for a given follow-up time, we add the slope times the value of X1 accordingly. So we'll see that although the intercept changes over time, this slope still compares two groups on the outcome, the differences in the log, hazard or risk of the outcome occurring for two groups who differ by one unit X1 regardless of what time in the follow up period they're being compared. Again X1 our predictor can be binary, nominal, categorical or continuous like we've seen with all other regression types. So this idea of the log hazard of Y equals one, so what does this mean? Well, for example, if Y equals one, if a cancer patient in remission has a relapse and a zero if they're censored, then the log of the hazard that Y equals one over the follow-up period is the log hazard of relapsing. If Y equals one if a subject quits smoking then the follow-up period after for example being exposed to some sort of treatment and zero if they drop out or make it to the end of the study without having the event, so in other words are censored, then the log hazard that Y equals one is synonymous with the log hazard of quitting smoking, which we'll model as a of function of the follow-up time and the predictor X1. If we're looking at survival and Y equals one if a person dies at a given time in the follow up period and zero if they're still alive, either when they were lost to follow up or make it to the end of the study and were hence censored, then the log hazard that Y equals one is the log hazard of death. Let's just re-familiarize ourself with the term hazard. We had seen it before in term one when we looked at Kaplan-Meier survival curve estimation. But what is hazard and this hazard that we're estimating the log of in a linear function of X1? Well, technically speaking, hazard is the instantaneous risk of having the outcome of interest at a given time in the followup period. A conditional on making it to that point in the follow-up period without previously having the event. So it's the risk of having the event among those who are at risk of still having the event at this time in the followup period. For interpretation purposes we can think of the hazard as the time-specific risk of having the event of interest or the time-specific incidence. So we will use hazard incidence pretty much interchangeably. So as with everything else we've done thus far, when we have a sample of timed event data we will only be able to estimate the regression equation from the sample data so to indicate that our intercept, which is now a function of time and our slope which is a single value our estimates based on the sample we will put hats on top of them. Why do we estimate things on the log hazard scale? Well the reason for this choice of scaling on the log scale is the same as with logistic regression. In order to do an unconstrained estimation of the slope here, the left hand side needs to have unrestricted values and it turns out that the hazard can only be positive, it can only be run from zero to infinity. But on the log scale the log hazard can run from negative infinity to positive infinity, so it's all real number values from negative infinity to positive infinity there doesn't have to be constraint on what the value of the slope could be to make our outcome conform to a restricted range. So, for a given value of X1 and a specific follow-up time T, the resulting Cox regression equation can be used to estimate the log hazard of our binary outcome Y for a group of subjects with the same value of X1. So if you give me the equation, a computer, which I'll need to access the function here that defines the intercept. You give me the equation and the computer, and you give me a value of T and a value of X1, we can plug those in to get an estimated value of the log hazard for that value of X1 at that specific time T. We'll see later in this lecture set that any estimated log hazard value can be back converted into an estimated survival probability S hat of T, and we will discuss how this is done in the last section of this lecture set. So again, for any given simple Cox proportional hazards regression model, this is a generic formulation again, the log hazard that Y equals one is a function of both our predictor X1 and our follow-up time T is equal to the log of some intercept risk at a specific time T plus a slope which is a constant value does not depend on time, times our predictor X1. The log of lambda naught T here, this log of the hazard that Y equals one is the estimated function or log hazard over time at the value X1 equals zero. So what beta one does is, like in any other regression we've seen before except we have to specify for a fixed value of time now, beta one estimates the difference in the log hazard of the outcome for two values that differ by one unit next. So it's a difference in log hazards which will quickly show and you're probably thinking already now a difference in logs can be re-expressed as a log of a ratio. So, what do I mean again by a difference in log hazard for a one unit difference is x at a specific time t? So, this is going to look very similar to the parsing we've done with all other types of regression, but let's compare the estimated log hazard for two groups, those with an x_1 value of a plus one, generically speaking and those with an x_1 value of a. So, these two groups differ by one unit, x_1. We're going to compare their log hazards at the same time in the follow up here and I'll just generically call that t. So, the estimated log hazard for the first group is equal to the log of the intercept evaluated at time t plus the slope Beta one hat times the x value of a plus one. At the other group with an x_1 value equal to a, it's the same intercept. We take the log of the risk at the same time t, but add beta one hat times just a. So, in the top one, Beta one hat times a plus one can be re-expressed as Beta one hat times a plus Beta one hat. So, if we take the difference in these two quantities, the intercepts cancel because they're both being evaluated at the same time, t that B_1 hats times a's cancel and what we're left is just one occurrence of the slope. So, again, Beta one hat can be generically expressed as the difference in the log hazard of the outcome occurring at a specific time for two groups with x_1 values that differ by one unit and you all know, as soon as you hear log minus log, your ears perk up and you go, "John that's a log of a ratio". So, another way to express this and to get it close to a measure of association we're familiar with, this can be re-expressed as the log of the ratio of the hazards of the outcome occurring for the group with the x_1 value of a plus one compared to the group with the x_1 value one less equal to a. So, beta one hat is the log of the hazard ratio, the hazard of the outcome for two groups, the hazard ratio, the outcome for two groups to differ by one unit in x_1 at a given time in the follow-up period. So, again, slopes are interpretable as the difference in log hazard per unit difference in x_1 at any given time in the follow-up period. As long as we're comparing the hazards at the same time in the follow-up period for two groups who differ by one unit of x, the difference is always this constant slope of Beta one hat. We exponentiate that, we get the estimated hazard ratio of the outcome for two groups who differ one unit in x_1 at any point time, t, in the follow-up period. So, what do I mean by that, let's just fix t at two different values and I'll show you that the difference is always Beta one if we compare two groups who differ by one unit in our predictor. So, if we fix the hazard and the time at 30 days of follow-up, and we compare the log hazards when x_1 equals a plus one or x_1 equals a, the intercept for both these computations is the same, it's the log of the estimated risk function at 30 days for both groups. That will cancel out and the comparison and the only thing that drops through and remains after taking this difference is the single occurrence of Beta one hat. Similarly, if we looked at another value of 275 days, the same thing is true. The intercept is fixed for both estimates, log hazard growth estimates is the starting point is the log of this baseline risk function evaluated at 275 days for both and if we look at the comparisons and take the difference that cancels out and the only thing that carries through is the extra slope we have up here. So, again, regardless of the time we're making this comparison, Beta one is the difference in the log hazard between two groups who differ by one unit in x_1 and when we exponentiate that, we get the hazard ratio of the outcome for two groups who differ by one unit in x_1 regardless of the point in time we're making it. So, what is this intercept, this log lambda naught of t? Well, this is a function that varies over time. It encapsulates the baseline shape of the log hazard that y equals one over time for the group defined by x_1 value of zero. This log hazard that y equals one shifts up or down from this baseline function by the slope for each one unit increment in x_1. So, what do I mean by this? Let me try and draw, it's going to be a crude drawing, but hopefully it will be helpful. What we're modeling with this. So, there's three-dimensions this and I wanted to try and reduce it to a two-dimensional graph as we have in our regression equation, the log of the hazard of the outcome. One of the dimensions that's being encapsulated by the intercept is time, follow-up time, that starts at time zero and forever how long the study goes forward. So, there's a general shape to this hazard and this is been estimated by the model and I'll talk more about that later in these lecture sets. This log of this hazard can vary over time as a function of time. So, let's assume the shape of the hazard is something like this on the log scale, it goes up, it comes down, it's very complicated as a function of time. When x_1 equals zero, this is the log of what we might call a baseline hazard function, lambda hat naught of t. What this model requires is that the hazards for all other groups defined by different values of x_1, shift up or down from this general shape by common difference, that difference of Beta one hat, the slope. So, if we were to look at the hazard when x_1 equals one, if the slope is positive, it would look something like this, it would be a parallel function to this baseline function and at any point in the follow-up period on the log scale, the difference in these two values when x_1 equals one versus x_1 equals zero would be the slope. We had x_1 equals two and the slope repository it would be up here, same shape that would differ from the curve where x_1 equals one by Beta one hat and it would differ from the baseline by Beta one hat plus Beta one hat. So, the idea is that these functions can vary over time, the risks can vary over time for any of the groups we're looking at. This model allows for that, but the difference in the estimated log hazard between any two groups defined by different x values is a function simply of this slope Beta one hat. So, if you think about it, when we exponentiate something that has a constant difference on the log scale, turns out that the difference is no longer constant, but the ratio is constant on the exponentiated scale. So, if we were to regraph these things and I'm in a lot of trouble drawing this, but if we were to regraph these things not on the log hazard scale, but on the hazard scale as a function of time, then these curves would be the exponentiated version of the previous curves for any and this isn't quite the same as what I did before, but this is the baseline x_1 equals zero. This is the exponentiated intercept. So, this is the function itself, now on a log scale, then the function for x_1 equals one on the hazard scale would not be parallel anymore, but every point on this scale at any given time the estimated hazards would be in a constant proportion. So, the ratio of the hazards at any given time would be equal to the exponentiated intercept. This is the property of proportional hazards. So, long as we're willing to assume that the relative hazard of the outcome is constant as a function of x_1 or predictor, regardless of the time we're looking at, we can fit this model. We'll look at that again in more detail with real data examples. But in summary, just to start, Cox proportional hazards regression allows for the estimation of a log hazard ratio and hence a hazard ratio comparing the hazard or risk of an outcome over the follow-up period between any two groups who differ by one unit in our predictor x_1. This model assumes that the relative hazard or hazard ratio of the outcome for any two groups being compared is constant across the entire follow-up.l Not that the risks or hazards are constant over time, but their relative ratio is.