Business Problem

A machine learning model is requested to predict whether individuals have diabetes or not when their characteristics are specified. It is expected that you perform the necessary data analysis and feature engineering steps before developing the model.

Data Story

The dataset is part of a larger dataset maintained by the National Institute of Diabetes and Digestive and Kidney Diseases in the United States. It contains data collected for a diabetes study conducted on Pima Indian women aged 21 and over, residing in Phoenix, the fifth-largest city in the state of Arizona. The target variable is labeled as "outcome," where 1 indicates a positive diabetes test result, and 0 indicates a negative test result.

Pregnancies	Pregnancy Number
GlucoseOral	2-hour plasma glucose concentration
BloodPressure	Blood Pressure (Diastolic) (mm Hg)
SkinThickness	Skin Thickness
Insulin	2-hour serum insulin (mu U/ml)
DiabetesPedigreeFunction
BMI	Body Mass Index Value
Age	Age
Outcome	Having the disease (1) or not (0)

Project Tasks

Task 1: Exploratory Data Analysis

Step 1: Examine the big picture.
Step 2: Identify numerical and categorical variables.
Step 3: Analyze numerical and categorical variables.
Step 4: Conduct target variable analysis (Mean of the target variable by categorical variables, mean of numerical variables by the target variable).
Step 5: Perform outlier analysis.
Step 6: Carry out missing data analysis.
Step 7: Perform correlation analysis.

Task 2: Feature Engineering

Step 1: Handle missing and outlier values. In the dataset, there are no missing observations, but some values such as 0 in variables like Glucose or Insulin may indicate missing values. For example, a person's Glucose or Insulin value cannot be 0. You can consider replacing these 0 values with NaN and then apply the necessary operations for missing values.
Step 2: Create new features.
Step 3: Perform encoding operations.
Step 4: Standardize numerical variables.
Step 5: Build a model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Datasets		Datasets
Homework		Homework
Src		Src
.gitignore		.gitignore
README.md		README.md
exercises.py		exercises.py
main.py		main.py
miuul_feature_engineering.py		miuul_feature_engineering.py
preprocessing_sample_code.py		preprocessing_sample_code.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Business Problem

Data Story

Project Tasks

Task 1: Exploratory Data Analysis

Task 2: Feature Engineering

About

Uh oh!

Releases

Packages

Languages

syatagan/FeaturedEngineering

Folders and files

Latest commit

History

Repository files navigation

Business Problem

Data Story

Project Tasks

Task 1: Exploratory Data Analysis

Task 2: Feature Engineering

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages