Welcome to the House Price Predictor project! This is a real-world, end-to-end MLOps use case designed to help you master the art of building and operationalizing machine learning pipelines.
You'll start from raw data and move through data preprocessing, feature engineering, experimentation, model tracking with MLflow, and optionally using Jupyter for exploration – all while applying industry-grade tooling.
🚀 Want to master MLOps from scratch?
Check out the MLOps Bootcamp at School of DevOps to level up your skills.
house-price-predictor/
├── configs/ # YAML-based configuration for models
├── data/ # Raw and processed datasets
├── deployment/
│ └── mlflow/ # Docker Compose setup for MLflow
├── models/ # Trained models and preprocessors
├── notebooks/ # Optional Jupyter notebooks for experimentation
├── src/
│ ├── data/ # Data cleaning and preprocessing scripts
│ ├── features/ # Feature engineering pipeline
│ ├── models/ # Model training and evaluation
├── requirements.txt # Python dependencies
└── README.md # You’re here!
To begin, ensure the following tools are installed on your system:
- Python 3.11
- Git
- Visual Studio Code or your preferred editor
- UV – Python package and environment manager
- Docker Desktop or Podman Desktop
-
Fork this repo on GitHub.
-
Clone your forked copy:
# Replace xxxxxx with your GitHub username or org git clone https://github.com/xxxxxx/house-price-predictor.git cd house-price-predictor
-
Setup Python Virtual Environment using UV:
uv venv --python python3.11 source .venv/bin/activate
-
Install dependencies:
uv pip install -r requirements.txt
To track experiments and model runs:
cd deployment/mlflow
docker compose -f mlflow-docker-compose.yml up -d
docker compose ps
🐧 Using Podman? Use this instead:
podman compose -f mlflow-docker-compose.yml up -d
podman compose ps
Access the MLflow UI at http://localhost:5555
If you prefer an interactive experience, launch JupyterLab with:
uv python -m jupyterlab
# or
python -m jupyterlab
Clean and preprocess the raw housing dataset:
python src/data/run_processing.py --input data/raw/house_data.csv --output data/processed/cleaned_house_data.csv
Apply transformations and generate features:
python src/features/engineer.py --input data/processed/cleaned_house_data.csv --output data/processed/featured_house_data.csv --preprocessor models/trained/preprocessor.pkl
Train your model and log everything to MLflow:
python src/models/train_model.py --config configs/model_config.yaml --data data/processed/featured_house_data.csv --models-dir models --mlflow-tracking-uri http://localhost:5555
The code for both the apps are available in src/api
and streamlit_app
already. To build and launch these apps
- Add a
Dockerfile
in the root of the source code for building FastAPI - Add
streamlit_app/Dockerfile
to package and build the Streamlit app - Add
docker-compose.yaml
in the root path to launch both these apps. be sure to provideAPI_URL=http://fastapi:8000
in the streamlit app's environment.
Once you have launched both the apps, you should be able to access streamlit web ui and make predictions.
You could also test predictions with FastAPI directly using
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"sqft": 1500,
"bedrooms": 3,
"bathrooms": 2,
"location": "suburban",
"year_built": 2000,
"condition": fair
}'
Be sure to replace http://localhost:8000/predict
with actual endpoint based on where its running.
This project is part of the MLOps Bootcamp at School of DevOps, where you'll learn how to:
- Build and track ML pipelines
- Containerize and deploy models
- Automate training workflows using GitHub Actions or Argo Workflows
- Apply DevOps principles to Machine Learning systems
We welcome contributions, issues, and suggestions to make this project even better. Feel free to fork, explore, and raise PRs!
Happy Learning!
— Team School of DevOps