Skip to content

syatagan/FeaturedEngineering

Repository files navigation

Business Problem

A machine learning model is requested to predict whether individuals have diabetes or not when their characteristics are specified. It is expected that you perform the necessary data analysis and feature engineering steps before developing the model.

Data Story

The dataset is part of a larger dataset maintained by the National Institute of Diabetes and Digestive and Kidney Diseases in the United States. It contains data collected for a diabetes study conducted on Pima Indian women aged 21 and over, residing in Phoenix, the fifth-largest city in the state of Arizona. The target variable is labeled as "outcome," where 1 indicates a positive diabetes test result, and 0 indicates a negative test result.
PregnanciesPregnancy Number
GlucoseOral2-hour plasma glucose concentration
BloodPressureBlood Pressure (Diastolic) (mm Hg)
SkinThicknessSkin Thickness
Insulin2-hour serum insulin (mu U/ml)
DiabetesPedigreeFunction
BMIBody Mass Index Value
AgeAge
OutcomeHaving the disease (1) or not (0)

Project Tasks

Task 1: Exploratory Data Analysis

Step 1: Examine the big picture.
Step 2: Identify numerical and categorical variables.
Step 3: Analyze numerical and categorical variables.
Step 4: Conduct target variable analysis (Mean of the target variable by categorical variables, mean of numerical variables by the target variable).
Step 5: Perform outlier analysis.
Step 6: Carry out missing data analysis.
Step 7: Perform correlation analysis.

Task 2: Feature Engineering

Step 1: Handle missing and outlier values. In the dataset, there are no missing observations, but some values such as 0 in variables like Glucose or Insulin may indicate missing values. For example, a person's Glucose or Insulin value cannot be 0. You can consider replacing these 0 values with NaN and then apply the necessary operations for missing values.
Step 2: Create new features.
Step 3: Perform encoding operations.
Step 4: Standardize numerical variables.
Step 5: Build a model.

About

This repository contains Feature Engineering Assignment project codes Data Science Bootcamp

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages