CIS4930: Introduction to Machine Learning Final Project - Analysis in characteristics for diabetes detection
- David Visbal Gomez
- Joshua Delamater
- Jesus Lopez
- Christian Rodriguez
- Vance Boudreau
We will use a dataset sourced from Kaggle for diabetes prediction. The dataset features 9 columns, including:
- Gender
- Age
- Hypertension status
- Heart disease status
- Smoking history
- Body Mass Index (BMI)
- Hemoglobin A1c levels
- Blood glucose levels
- Diabetes status
The dataset contains integer, decimal, and string data types. It is essential for our project as it provides critical information for determining diabetes.
- We plan to find the underlying factors that have a heavy correlation to diabetes.
- We plan to use a combination of a linear regression model for the numerical data and a classification tree for the string data.
- We will distribute the workload among the members of the group, and we will clean the data and scale the data so that the model can converge quicker.