🚀 Engineering and MLOps Practices for Modern AI - Starter Project Iris

Welcome to the "Engineering and MLOps practices for Modern AI" course starter project! This repository provides a simple, end-to-end Machine Learning pipeline using the well-known Iris dataset. It's designed to be a practical starting point for you to explore and apply fundamental MLOps patterns and modern development tools.

This project emphasizes solid Software Engineering practices as the foundation for effective MLOps, including:

✨ Automated Code Quality: Linting (Ruff), formatting (Ruff), and static type checking (MyPy).
🧪 Robust Testing: Unit tests with Pytest and code coverage reporting.
🛡️ Pre-commit Hooks: Automating code quality checks before every commit.
🔄 Continuous Integration (CI): Automated validation of your code with GitHub Actions.
⚙️ Build Automation: Using make to streamline common development tasks.
📦 Dependency Management: Consistent and reproducible environments managed by uv.
🏗️ Code Modularity: An organized structure for your source code.
✍️ Readability & Maintainability: Enhanced through type annotations and docstrings.

🎯 Project Goal

The core objective is to train a classifier on the Iris dataset. However, the real learning comes from understanding the MLOps practices applied: versioning data and models (conceptually, for now), tracking experiments, automating the pipeline, and ensuring code quality and reproducibility throughout the development lifecycle.

📂 Project Structure

mlops-get-started-iris/
├── .github/                # GitHub specific configurations (e.g., Workflows for CI)
├── data/
│   ├── .gitignore          # Specifies data files not to be tracked by Git
│   ├── features_iris.csv   # Processed features (generated by src/load_data.py)
│   ├── train.csv           # Training dataset (generated by src/split_dataset.py)
│   ├── test.csv            # Test dataset (generated by src/split_dataset.py)
│   └── eval.json           # Evaluation metrics (generated by src/evaluate.py)
├── docs/
│   └── DEVELOPMENT.md      # Detailed guide for developers on coding standards and tools
├── models/                 # Intended for trained models
│   └── model.joblib        # Trained model artifact (generated by src/train.py)
├── src/
│   ├── load_data.py        # Script for loading and initial preprocessing of Iris data
│   ├── split_dataset.py    # Script for splitting data into training and testing sets
│   ├── train.py            # Script for training the machine learning model
│   └── evaluate.py         # Script for evaluating the trained model's performance
├── tests/
│   ├── __init__.py         # Makes 'tests' a Python package
│   └── test_load_data.py   # Example unit tests for the data loading script
├── .gitignore              # Global Git ignore patterns for the project
├── .pre-commit-config.yaml # Configuration for pre-commit hooks (Ruff, Mypy, etc.)
├── .python-version         # Specifies the preferred Python version (e.g., for pyenv or uv)
├── .secrets.baseline       # Baseline file for detect-secrets (prevents committing secrets)
├── Makefile                # Defines useful development commands (e.g., make lint, make test)
├── pyproject.toml          # Project configuration, dependencies, and tool settings (PEP 621)
└── uv.lock                 # Lock file for reproducible Python dependencies (generated by uv)

Key Structure Notes:

data/ Directory: In this initial setup, data/ holds both input (implicitly, as load_data.py likely fetches it) and output (processed data, model, metrics) files. In more advanced MLOps scenarios, you'd typically separate raw data, processed data, and model artifacts, often versioning them with tools like DVC.
models/ Directory: While present, it's not explicitly used by the scripts in src/ which save model.joblib to data/. This directory is a placeholder for when you start versioning models more formally (e.g., with DVC or MLflow Model Registry).
pyproject.toml: This is your central hub for defining project dependencies (managed by uv) and configuring tools like Ruff, Mypy, and Pytest.
Makefile: Your friend for running common tasks like linting, testing, and executing the pipeline with simple commands.

🛠️ Prerequisites

Ensure you have the following installed on your system:

Python 3.12+: We recommend using the version specified in the .python-version file.
uv: A fast Python package installer and project manager. See the uv installation guide.
Git: For version control.
Make: (Optional, but highly recommended for convenience).
- macOS/Linux: Usually pre-installed.
- Windows: Consider installing via Chocolatey (choco install make) or using Windows Subsystem for Linux (WSL).

🚀 Quick Start: Installation & Setup

Clone the Repository:

git clone <YOUR_REPOSITORY_URL>
cd mlops-starter-project-iris

Set Up Python Environment

# Create a virtual environment (e.g., named .venv) using the project's Python version
uv venv .venv --python 3.11

# Activate the virtual environment:
# On macOS and Linux:
source .venv/bin/activate
# On Windows (PowerShell):
# .\.venv\Scripts\Activate.ps1
# On Windows (Command Prompt):
# .\.venv\Scripts\activate.bat

Install Dependencies: We'll use uv to create a virtual environment and install dependencies.

# Install project dependencies, including development tools:
uv sync --dev

# Initialize Pre-commit Hooks
uv run pre-commit install

Alternatively, the Makefile provides a shortcut:

make install # This target in the Makefile should execute the uv commands above

You're now ready to start!

▶️ Running the ML Pipeline

The pipeline consists of data loading, splitting, model training, and evaluation.

Option 1: Using make (Recommended for Simplicity) The Makefile includes a target to run the entire pipeline:

make run-pipeline

This will execute the Python scripts in src/ in the correct order. Check the Makefile to see the exact commands if you're curious!

Option 2: Manual Execution of Python Scripts If you want to run each step individually to understand the flow:

# Ensure your virtual environment is activated: source .venv/bin/activate

python src/load_data.py
python src/split_dataset.py --test_size 0.2 # Example of passing an argument
python src/train.py
python src/evaluate.py

After running, you should find features_iris.csv, train.csv, test.csv, model.joblib, and eval.json inside the data/ directory.

💻 Development Workflow & Tools

This project is set up with tools to help you write high-quality code efficiently.

Makefile: Your primary interface for common tasks. Run make help to see all available commands:
```
make help
```
This will list targets like lint, format-check, test, clean, etc.
Code Formatting (Ruff Formatter):
- Check if files need formatting: make format-check (uses uv run ruff format --check .)
- Automatically format files: make format (uses uv run ruff format .)
Linting (Ruff Linter):
- Check for style and logical errors: make lint (uses uv run ruff check .)
- Attempt to auto-fix lint issues: make lint-fix (uses uv run ruff check --fix .) (If this target exists in your Makefile)
Type Checking (Mypy):
- Verify type annotations: make mypy (uses uv run mypy src tests)
Testing (Pytest):
- Run all tests: make test (uses uv run pytest)
- This command also generates a code coverage report, viewable by opening htmlcov/index.html in your browser.
Pre-commit Hooks: These run automatically on git commit. If they find issues (e.g., formatting errors, lint problems), the commit will be stopped, and you'll see messages on what to fix. After fixing (or if the hooks fix them automatically), git add the changed files and commit again.

For a deeper dive into the development workflow, tool configurations, and contribution guidelines, please consult the DEVELOPMENT.md file.

Happy MLOps journey! We hope this starter project helps you learn and experiment. 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Engineering and MLOps Practices for Modern AI - Starter Project Iris

🎯 Project Goal

📂 Project Structure

🛠️ Prerequisites

🚀 Quick Start: Installation & Setup

▶️ Running the ML Pipeline

💻 Development Workflow & Tools

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
data		data
docs		docs
models		models
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.secrets.baseline		.secrets.baseline
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

mlrepa/mlops-starter-project-iris

Folders and files

Latest commit

History

Repository files navigation

🚀 Engineering and MLOps Practices for Modern AI - Starter Project Iris

🎯 Project Goal

📂 Project Structure

🛠️ Prerequisites

🚀 Quick Start: Installation & Setup

▶️ Running the ML Pipeline

💻 Development Workflow & Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages