Manuscript: Can-SAVE: Mass Cancer Risk Prediction via Survival Analysis Variables and EHR
The source code to implement the feature engineering step of the Can-SAVE method.
git clone https://github.com/sb-ai-lab/CanSave.git
cd CanSave
pip install -r requirements.txt
pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10
- Can-SAVE/: Core implementation
- EHR/: Simulated sample of EHR data
- survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)
Can-SAVE/
├── EHR/
│ └── id_26.csv
├── survival_models/
│ ├── kaplan_meier_both.pkl
│ ├── kaplan_meier_males.pkl
│ ├── kaplan_meier_females.pkl
│ └── aft.pkl
├── CanSave.py
├── Example_How_To_Train_Survival_Models.py
├── KaplanMeierEstimator.py
├── CONFIG_CanSave.yaml
├── icd10_groups.xlsx
├── requirements.txt
├── LICENSE
└── README.md
$ python Example_How_To_Train_Survival_Models.py
$ python CanSave.py
# required libraries
import numpy as np
import pandas as pd
from CanSave import CanSave
# entry point
if __name__ == '__main__':
# Make new object for feature engineering
config_path = './CONFIG_CanSave.yaml'
cs = CanSave(CONFIG_PATH=config_path)
print(help(cs))
# Load the patient's EHR
path_ehr = './EHR/id_26.csv'
ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
sex = ehr['sex'].iloc[0]
birth_date = ehr['birth_date'].iloc[0]
# Make feature engineering for the risk prediction
features = cs.feature_engineering(
sex = sex, # sex of the patient
birth_date = birth_date, # birth date of the patient
ehr = ehr, # Electronic Health Records of the patient
date_pred = '2022-01-01', # date of the risk estimation
deep_weeks = 108 # deep of the EHR's history (in weeks)
)