Skip to content

Submission 1 of "Machine Learning Operations (MLOps)" course from Dicoding Indonesia. Creating machine learning pipeline using TensorFlow-Extended (TFX)

Notifications You must be signed in to change notification settings

bayu-siddhi/tfx-obesity-detection

Repository files navigation

Obesity Level Prediction Based on Physical Condition and Lifestyle Using TensorFlow-Extended Pipeline


This (training) project aims to develop a machine learning model to predict obesity levels based on individuals' physical conditions and lifestyle habits by using TensorFlow-Extended pipeline.

Section Description
Dataset The dataset used for this project is the Estimation of Obesity Levels Based On Eating Habits and Physical Condition from the UCI Machine Learning Repository.
Problem Obesity is a serious global health issue with significant impacts on individual health and healthcare systems. Unhealthy eating habits and lifestyles are major contributing factors to the increased risk of obesity. The "Estimation of Obesity Levels Based On Eating Habits and Physical Condition" dataset provides comprehensive data on eating habits and physical conditions of individuals from Mexico, Peru, and Colombia. This project aims to develop a machine learning model that can predict an individual's obesity level based on eating habits, physical activity, and other related factors. By identifying the most influential factors, this project seeks to provide insights into the complex relationship between lifestyle and obesity, thereby supporting the development of more effective public health strategies.
Machine Learning Solution The machine learning solution implemented is a multi-class classification model. This model will be trained to classify an individual's obesity level into seven categories: Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II, and Obesity Type III, based on the features in the dataset. The model will be built using a neural network architecture.
Preprocessing Method The dataset will be split into training and evaluation sets. The dataset consists of numerical and categorical features, where categorical features include nominal and ordinal types. Ordinal categorical features will be converted to numerical using label encoding, while nominal categorical features will be converted using one-hot encoding. Numerical features will be scaled to align their scales and facilitate machine learning model training.
Model Architecture The model architecture used is a simple neural network built using TensorFlow, with an end-to-end pipeline workflow utilizing TensorFlow Extended (TFX). The model will be constructed with several dense (fully connected) layers with ReLU activation functions, culminating in a dense layer with a softmax activation function for multi-class classification (7 obesity level classes).
Evaluation Metrics To evaluate the model's performance, several classification metrics will be used, including AUC, precision, recall, false positive, true positive, false negative, true negative, accuracy, and F1-score. The F1-score will be the primary metric to determine if a change in the model is significant enough to be considered a real improvement.
Model Performance The developed neural network model demonstrates good performance in predicting the 7 obesity level classes, with accuracy and F1-score values of 0.965. However, there is still some validation instability during training, indicating potential for further improvement.

Note

This project was developed using Python 3.9.13. A complete list of dependencies can be found in the requirements.txt file.

About

Submission 1 of "Machine Learning Operations (MLOps)" course from Dicoding Indonesia. Creating machine learning pipeline using TensorFlow-Extended (TFX)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published