Train/ test split in survival modelling and evaluating model predictions #1233

konradsemsch · 2021-02-24T08:14:21Z

konradsemsch
Feb 24, 2021

Hi @CamDavidsonPilon! I'm having doubts when it comes to evaluation of my model on train/ test data, as well as the appropriate usage of conditional_after in that process. Let me explain:

In our projects we're seeking to predict customer churn. We have a starting hypothesis that the mean predicted lifetime should have a particular, linear relationship with a particular explanatory variable. At least such a relationship is observable when looking at the non-censored data lifetime value (duration from contract start to contract end). As a side note, the level of censoring is around 60%.

After training the model I would like to evaluate if model's predicted expectations (using predict_expectation) on the train/ test data, line up with our initial hypothesis. Here a couple of questions come up with and apologies if these are really basic but I'm relatively new to survival modelling:

should I interpret the result predict_expectation using conditional_after on censored data as the "additional lifetime on top of their current lifetime" or "total lifetime including their current lifetime"?
running predict_expectation on censored data without using conditional_after produces expected lifetimes as if such customers have in fact churned already correct? As such, these estimations are not really correct?
in evaluating predictions on a test set where I have both censored and uncensored observations mixed together, I should run the predict_expectation function without conditional_after on the non-censored observations, and with conditional_after on the censored ones, correct?
and a bit of side question to that: when forming my train and test sets, should I ensure similarly to typical classification problems, so that my target variable (churn) be balanced across these sets? Or should the test set include only the non-censored observations to truly evaluate my predictions? As long as I really like the comprehensive documentation of the package, I'm missing a little bit more details on evaluating if the predicted expected lifetimes are truly correct. Some more insights into best practices of that would be great (although I know it's not it's main purpose 😉 )

CamDavidsonPilon · 2021-02-25T01:43:04Z

CamDavidsonPilon
Feb 25, 2021
Maintainer

Let me try to help:

"additional lifetime on top of their current lifetime" is the correct interpretation
This would be the same as "conditional_after=0", so it's like saying "suppose this individual was born today - how long do we expect them to live."
Assuming you are using something like a square-error-loss, this is tough: what is your "true value" for censored individuals - you don't actually know what their lifetime is! You may want to only have uncensored indv. in your test dataset.
Ah, you observed the problem I mentioned above. Ideally, it's balanced, because you may have a situation like this: short uncensored indv., long uncensored indv. The c-index is an evaluation metric that can handle a test dataset with censored and uncensored individuals, check that out.

3 replies

konradsemsch Feb 25, 2021
Author

Thanks for your response and I really appreciate your time! 😉 Let me build up a bit more on top of your questions then:

Ad 1 & 2. Ok, I see - that makes sense, although it makes evaluating predicted lifetimes a bit tricky. Is there a wrapper predict function which would return "aggregated" predictions for both censored and uncensored data? For examples: for non-censored it's clear, but for uncensored it would need to add the current observed lifetime to the additional predicted estimated lifetime and combine both sets at the end. Not sure if that actually even makes sense, as I'm thinking very much right now in terms of other, more common ML applications

Ad 3. Right, but putting aside the obvious issue with evaluation of censored/ non-censored observations in the test set: Is that actually a common practice to divide the dataset into train and test in survival modelling (as it is for other ML applications)? Lifeline tutorials don't really depict that and model formulation is more focused on making sure that model specification is correct using a training set only, hence, maybe using test sets is not really exactly needed/ poses evaluation problems for survival models? The only additional outer set validation is done using CV which I saw in your tutorials

Ad 4. Yes, that makes sense. I guess doing CV + reserving a test set to observe C-index stability makes sense, and is the only real point of actually separating a test set

I also just wanted to touch on one more topic in this context. E.g. the CoxPHFitter has 3 interesting predict_ functions:

predict_median
predict_percentile
predict_expectation

I understand that predict_median and predict_percentile are connected to each other, as the latter predicts the median lifetime by default, but could also return lifetimes at other quantiles, but how to make a connection between e.g. predict_percentile and predict_expectation? E.g. does predict_expectation return the value of predict_percentile(1) or some other quantity? For inference applications that depend on the output of the model giving an "estimation of how long a given client will keep his contract before churning", which of these predict functions (and with which percentile if applicable), should I use?

Sorry for such lengthy questions, but I hope some more people could also benefit from the answers! 😉

konradsemsch Mar 9, 2021
Author

@CamDavidsonPilon - could you perhaps clarify these points, if you find the time? Answers around point 5 would be especially helpful for us 😉

shj997 Jan 13, 2022

For point 3. one suggestion is to have a test set where you know the event occurs but hide this during the training phase. An example, you have data from 2017 to 2021. The test set can be everyone that survived by end of 2020 but has an event in 2021 by changing their event indicator to no event and subtract the 2021 time until event from the total T, let's call this T-x. This new T-x can be used in the conditional_after argument and now you make predictions for the remaining time until the event happens on this test set.

You then take the difference between the predicted times (predictions plus new T-x) and compare it with the original T from the test group. From there you can calculate MAE, MAPE, RMSE, etc. to see how accurate your model is in predicting duration until event.

It becomes like a time series forecast using the probabilities at each time to determine when you should consider the event to occur in the future.

@CamDavidsonPilon any comment on this approach? I think more and more business/academic problems are trying to accurately predict when events will happen (traditional time series question) and what are the individual drivers that led to the event occurring (regression question).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Train/ test split in survival modelling and evaluating model predictions #1233

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Train/ test split in survival modelling and evaluating model predictions #1233

Uh oh!

konradsemsch Feb 24, 2021

Replies: 1 comment · 3 replies

Uh oh!

CamDavidsonPilon Feb 25, 2021 Maintainer

Uh oh!

konradsemsch Feb 25, 2021 Author

Uh oh!

konradsemsch Mar 9, 2021 Author

Uh oh!

Uh oh!

shj997 Jan 13, 2022

konradsemsch
Feb 24, 2021

Replies: 1 comment 3 replies

CamDavidsonPilon
Feb 25, 2021
Maintainer

konradsemsch Feb 25, 2021
Author

konradsemsch Mar 9, 2021
Author