Skip to content

A machine learning pipeline for predicting lung cancer severity using patient lifestyle and health data, including data cleaning, visualization, and classification with Decision Tree, Random Forest, and Logistic Regression

Notifications You must be signed in to change notification settings

RushiChinagounolla/lung-cancer-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🫁 Lung Cancer Prediction with Machine Learning

A machine learning project predicting lung cancer severity based on patient lifestyle and health factors. The project includes data cleaning, visualization, and classification using Decision Tree, Random Forest, and Logistic Regression.


πŸ“ Files

  • lung_cancer_dataset.csv: Source dataset from Kaggle
  • lung_cancer_prediction.ipynb: Main Jupyter notebook with full analysis

Dataset Source

Kaggle: Cancer Patients and Air Pollution


πŸ” Techniques Used

  • Data cleaning and preprocessing with pandas
  • Data visualization using matplotlib and seaborn
  • Feature engineering and correlation analysis
  • Classification models:
    • Decision Tree
    • Random Forest
    • Logistic Regression

πŸ“Š Project Highlights

  • Explores the relationship between air pollution, smoking, and lung disease severity
  • Visualizes gender distribution, disease severity levels, and risk factors
  • Compares model performance through accuracy, confusion matrices, and classification reports

πŸ“ˆ Results

  • Achieved accuracy up to 87% using the Decision Tree model
  • Decision Tree outperformed Random Forest and Logistic Regression overall
  • Identified strong correlations between chronic lung disease and factors like smoking, air pollution, and occupational hazards

🧰 Tech Stack

  • Python
  • Pandas, NumPy
  • Matplotlib, Seaborn
  • Scikit-learn
  • Jupyter Notebook

About

A machine learning pipeline for predicting lung cancer severity using patient lifestyle and health data, including data cleaning, visualization, and classification with Decision Tree, Random Forest, and Logistic Regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published