Train/ test split in survival modelling and evaluating model predictions #1233
              
                Unanswered
              
          
                  
                    
                      konradsemsch
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 3 replies
-
| Let me try to help: 
 | 
Beta Was this translation helpful? Give feedback.
                  
                    3 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @CamDavidsonPilon! I'm having doubts when it comes to evaluation of my model on train/ test data, as well as the appropriate usage of conditional_after in that process. Let me explain:
In our projects we're seeking to predict customer churn. We have a starting hypothesis that the mean predicted lifetime should have a particular, linear relationship with a particular explanatory variable. At least such a relationship is observable when looking at the non-censored data lifetime value (duration from contract start to contract end). As a side note, the level of censoring is around 60%.
After training the model I would like to evaluate if model's predicted expectations (using predict_expectation) on the train/ test data, line up with our initial hypothesis. Here a couple of questions come up with and apologies if these are really basic but I'm relatively new to survival modelling:
should I interpret the result predict_expectation using conditional_after on censored data as the "additional lifetime on top of their current lifetime" or "total lifetime including their current lifetime"?
running predict_expectation on censored data without using conditional_after produces expected lifetimes as if such customers have in fact churned already correct? As such, these estimations are not really correct?
in evaluating predictions on a test set where I have both censored and uncensored observations mixed together, I should run the predict_expectation function without conditional_after on the non-censored observations, and with conditional_after on the censored ones, correct?
and a bit of side question to that: when forming my train and test sets, should I ensure similarly to typical classification problems, so that my target variable (churn) be balanced across these sets? Or should the test set include only the non-censored observations to truly evaluate my predictions? As long as I really like the comprehensive documentation of the package, I'm missing a little bit more details on evaluating if the predicted expected lifetimes are truly correct. Some more insights into best practices of that would be great (although I know it's not it's main purpose 😉 )
Beta Was this translation helpful? Give feedback.
All reactions