This project trains and compares machine learning models (Logistic Regression and Decision Tree) to predict whether a person earns more than $50K/year using the UCI Adult Income dataset. It features:
- Modularized ML pipeline (data → training → evaluation → logging)
- Data preprocessing (cleaning, encoding, scaling)
- Model training and evaluation
- Confusion matrix visualization for each model
- MLflow logging and model registry integration
- Custom experiment logging (not using the default)
- CI/CD with GitHub Actions
- Hosted MLflow tracking server (via Railway)
First, make sure you clone the repository. Then, run the following commands:
git clone https://github.com/<YOUR_GITHUB_WORKSPACE>/JNations-2025.git
cd JNations-2025
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
GitHub Actions uses secrets.MLFLOW_TRACKING_URI
.
Local .env
is used with python-dotenv
.
echo "MLFLOW_TRACKING_URI=https://<YOUR-MLFLOW-SERVER>.up.railway.app" > .env
- Go to repository > Setting
- Secrets and variables
- Press on Actions
- Press on New repository sercret and add
MLFLOW_TRACKING_URI
with your URI
python main.py
This will:
- Load and preprocess data
- Train Logistic Regression and Decision Tree models
- Evaluate each model, save F1 scores and confusion matrices
- Create or use a named MLflow experiment (
AdultIncomeExperiment
) - Log both models to MLflow
Outputs:
outputs/metrics.txt
– F1 scores for each modeloutputs/confusion_matrix_logistic_regression.png
outputs/confusion_matrix_decision_tree.png
- MLflow-registered models with version and stage
You can simply run MLflow locally using the following command:
mlflow server --host 0.0.0.0 --port 8080
If you want local persistance, indicate a simple sqlite. Additionally, you can indicate where artifacts are stored:
mlflow server --host 0.0.0.0 --port 8080 --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns
If you fancy hosting your MLflow and make it accessible, then let's host it on Railway (demo purposes).
- Go to Railway, register an account (using Github is convenient)
- You will get a free plan with $5, enough for this workshop :)
- Create a project
- Press on
+ Create
button top right - Choose template > look for
MLflow Tracking
- Deploy > you will get a URI (aka
MLFLOW_TRACKING_URI
)
On every push to main
:
main.py
is executed- Metrics and models are logged to MLflow
- Confusion matrices and metrics are uploaded as artifacts
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
Artifacts:
- 📄
metrics.txt
- 📊 confusion matrices
- 📦 model
.pkl
files (from MLflow artifacts)