Scalable Machine Learning and Deep Learning - Lab1: Titanic Dataset Classifier

Github repository:

https://github.com/daniel-rdt/serverless_ml_titanic_dr

File running sequence for titanic survival prediction:

run: python titanic-feature-pipeline.py (once)
run: python titanic-training-pipeline.py (once)
upload huggingface-spaces-titanic folder files to HuggingFace

Hugging Face Spaces public URL: https://huggingface.co/spaces/daniel-rdt/titanic

File running sequence for titanic survival prediction for most recent passenger added:

run: python titanic-feature-pipeline-daily.py (set LOCAL=FALSE to start scheduled daily run on MODAL)
run: python titanic-batch-inference-pipeline.py (set LOCAL=FALSE to start scheduled daily run on MODAL)
upload huggingface-spaces-titanic-monitor folder files to HuggingFace

Hugging Face Spaces public URL: https://huggingface.co/spaces/daniel-rdt/titanic_monitor

File explanation:

titanic_feature_prep.py:

contains the function "titanic_prep()"
downloading data from GitHub repository;
feature engineering:
1. removed "PassengerId","Name","Ticket" and "Cabin" columns;
1. all ages rounded up to next integer, so age less than 1 is rounded up to 1;
1. filled missing values for "Age" with created random age values in range from min. age to max. age;
1. filled missing values for "Embarked" with new "Unknown" category;
1. turned categorical variables ("Embarked" and "Sex") into binary numerical variables;
1. dropped old categorical columns "Sex" and "Embarked";
returns a clean, prepped titanic dataset;

titanic-feature-pipeline.py:

can be run either locally or with MODAL;
applies titanic_prep function to yield the prepped titanic data set as titanic_df;
creates Hopsworks feature group and stores features and labels of titanic_df into Hopsworks;

titanic-training-pipeline.py:

can be run either locally or with MODAL;
loades features and labels from Hopsworks feature store either as feature view or creates feature view from existing feature group;
Train-Test-Split with 80/20;
runs K-nearest-neighbours algorithm on train split;
runs evaluation on test split and visualized as confusion matrix;
Uploads model to the Hopsworks Model Registry;

huggingface-spaces-titanic folder:

folder containing main app for user interface and requirements;
app let's the user input feature values and returns a prediction

titanic-feature-pipeline-daily.py:

same as titanic-feature-pipeline-daily when BACKFILL=TRUE;
when BACKFILL=FALSE: generates new random passenger (either survivor or victim) and adds to Hopsworks feature group;
ranges for survivor/victim features were determined according to min/max values or existing categories of a feature from the original titanic data set filtered for the labels respectively;

titanic-batch-inference-pipeline.py:

can be run either locally or with MODAL on a daily schedule;
gets model from Hopsworks registry and batch data from feature view;
predicts if the synthetic passengers survived or not;
downloads appropriate images for prediction and creates up to date confusion matrix when there have been 2 predictions made to date;
stores predictions to Hopsworks;

huggingface-spaces-titanic-monitor folder:

folder containing main app for user interface and requirements;
app shows the most recent synthetic passenger prediction and real label, and a confusion matrix with historical prediction performance;

General notes

everyone aged less than 1 year old is considered as 1 year old person;
randomly generated passengers ranges for victims and survivors are very similar which may result in very ambiguous prediction results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scalable Machine Learning and Deep Learning - Lab1: Titanic Dataset Classifier

Github repository:

File running sequence for titanic survival prediction:

File running sequence for titanic survival prediction for most recent passenger added:

File explanation:

General notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
huggingface-spaces-titanic-monitor		huggingface-spaces-titanic-monitor
huggingface-spaces-titanic		huggingface-spaces-titanic
LICENSE		LICENSE
README.md		README.md
titanic-batch-inference-pipeline.py		titanic-batch-inference-pipeline.py
titanic-feature-pipeline-daily.py		titanic-feature-pipeline-daily.py
titanic-feature-pipeline.py		titanic-feature-pipeline.py
titanic-training-pipeline.py		titanic-training-pipeline.py
titanic_feature_prep.py		titanic_feature_prep.py

License

daniel-rdt/serverless_ml_titanic_dr

Folders and files

Latest commit

History

Repository files navigation

Scalable Machine Learning and Deep Learning - Lab1: Titanic Dataset Classifier

Github repository:

File running sequence for titanic survival prediction:

File running sequence for titanic survival prediction for most recent passenger added:

File explanation:

General notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages