GitHub - VioletGo319/Cervical-Cancer-Classification: Machine Learning Project using R

Prediction of Cervical Cancer Risk using Machine Learning

Task: Classification

Topic: Machine Learning, Cervical Cancer Diagnosis

Language: R

Project Overview

This project, developed by the VioletGo group, focuses on harnessing machine learning to create a predictive model for cervical cancer likelihood, grounded on variables related to medical and sexual history.

Objective

To predict the risk of cervical cancer using variables such as age, sexual partners, and diagnostic test outcomes.

Background

Cervical cancer stands as a prominent global health concern. In 2008, it was responsible for 275,000 deaths, of which 88% occurred in developing countries1. A primary cause is persistent infections linked to high-risk HPV2. Emphasizing the critical nature of early detection, particularly in areas with limited resources.

Dataset

Collated data from 858 patients. Features: 36 in total, encompassing demographic details, medical history, and lifestyle choices. Target Variables: 4 in total, signaling the presence of cervical intraepithelial neoplasia or cervical cancer.

Approach

Data Preprocessing

Imputed mean values for absent data in continuous variables. Adopted random sampling for imputation in categorical variables. Undertook feature standardization. Computed risk scores rooted in four test results. Divided data into: 80% training and 20% for testing.

Exploratory Modeling

Employed LASSO regression to select pivotal features. Established baseline models, including: Linear Regression, Random Forest, K-NN, SVM, and GBM. Detected a skewed class distribution, prompting corrective measures in subsequent phases.

Downsampling

Rectified class imbalances via downsampling. Conducted sensitivity analyses to verify the resilience of downsampling.

Final Modeling

Assessed the influence of downsampling on feature choices. Gauged the efficacy of the five models over diverse downsampling ratios.

Evaluation

Adopted Metrics: MSE, RMSE, MAE, CV_MSE, CV_RMSE, CV_MAE, and Accuracy. Outstanding performance by GBM with an accuracy peak at 0.835, using a 1:4 case-control dataset. SVM followed closely, registering an accuracy of 0.804 for a 1:3 case-control dataset. Linear Regression lagged behind in efficacy. Random Forest and KNN displayed average outcomes. Sensitivity tests affirmed unwavering performance across different case-control datasets.

Conclusion

The Gradient Boosting Machine (GBM) model emerged as the top performer in predicting cervical cancer risks, regardless of dataset variations. Potential future explorations could delve into fine-tuning model parameters, exploring innovative feature engineering methods, or testing advanced machine learning algorithms.

Usage

Install the necessary libraries and dependencies. Fetch the dataset and place it in the appropriate directory. Execute the preprocessing script for data refinement and setup. Run the model script for training and evaluation purposes. Utilize the refined model for predictions on fresh data.

References

World Health Organization's Statistics on Cervical Cancer, 2008

Role of HPV in Cervical Cancer by National Institute of Health, 2012

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Cervical_cancer_violetgo.Rmd		Cervical_cancer_violetgo.Rmd
Cervical_cancer_violetgo.pdf		Cervical_cancer_violetgo.pdf
Final report (1) (1).pdf		Final report (1) (1).pdf
README.md		README.md
risk_factors_cervical_cancer.csv		risk_factors_cervical_cancer.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prediction of Cervical Cancer Risk using Machine Learning

Task: Classification

Topic: Machine Learning, Cervical Cancer Diagnosis

Language: R

Project Overview

Objective

Background

Dataset

Approach

Data Preprocessing

Exploratory Modeling

Downsampling

Final Modeling

Evaluation

Conclusion

Usage

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

VioletGo319/Cervical-Cancer-Classification

Folders and files

Latest commit

History

Repository files navigation

Prediction of Cervical Cancer Risk using Machine Learning

Task: Classification

Topic: Machine Learning, Cervical Cancer Diagnosis

Language: R

Project Overview

Objective

Background

Dataset

Approach

Data Preprocessing

Exploratory Modeling

Downsampling

Final Modeling

Evaluation

Conclusion

Usage

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages