Skip to content

sb-ai-lab/Can-SAVE

Repository files navigation

Can-SAVE

Manuscript: Can-SAVE: Mass Cancer Risk Prediction via Survival Analysis Variables and EHR

The source code to implement the feature engineering step of the Can-SAVE method.

Installation

git clone https://github.com/sb-ai-lab/CanSave.git
cd CanSave
pip install -r requirements.txt

requirements.txt

pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10

Repository Structure

  • Can-SAVE/: Core implementation
  • EHR/: Simulated sample of EHR data
  • survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)
Can-SAVE/
├── EHR/
│   └── id_26.csv
├── survival_models/
│   ├── kaplan_meier_both.pkl
│   ├── kaplan_meier_males.pkl
│   ├── kaplan_meier_females.pkl
│   └── aft.pkl
├── CanSave.py
├── Example_How_To_Train_Survival_Models.py
├── KaplanMeierEstimator.py
├── CONFIG_CanSave.yaml
├── icd10_groups.xlsx
├── requirements.txt
├── LICENSE
└── README.md

Quick Start

1) How to Train Survival Models

$ python Example_How_To_Train_Survival_Models.py

2) How to Do Feature Engineering for Can-SAVE

Terminal

$ python CanSave.py

Python

# required libraries
import numpy as np
import pandas as pd

from CanSave import CanSave

# entry point
if __name__ == '__main__':
    # Make new object for feature engineering
    config_path = './CONFIG_CanSave.yaml'
    cs = CanSave(CONFIG_PATH=config_path)
    print(help(cs))

    # Load the patient's EHR
    path_ehr = './EHR/id_26.csv'
    ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
    sex = ehr['sex'].iloc[0]
    birth_date = ehr['birth_date'].iloc[0]

    # Make feature engineering for the risk prediction
    features = cs.feature_engineering(
        sex         = sex,              # sex of the patient
        birth_date  = birth_date,       # birth date of the patient
        ehr         = ehr,              # Electronic Health Records of the patient
        date_pred   = '2022-01-01',     # date of the risk estimation
        deep_weeks  = 108               # deep of the EHR's history (in weeks)
    )

About

Can-SAVE: Mass Cancer Risk Prediction via Survival Analysis Variables and EHR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages