AI MLOps Project

✨ Latest Updates (June 2025)

✅ Advanced time-based feature engineering (Spark-driven)
✅ train_sklearn.py: Calibrated + threshold-optimized RandomForest
✅ train_spark.py: Distributed GBTClassifier + full evaluation
✅ Precision and F1 improvements through GridSearch and calibration
✅ MLflow artifact tracking fully structured and consistent
✅ YAML-driven hyperparameter config with rolling metric logs
✅ Added model signature, input examples, confusion matrices, ROC, PR curves
✅ CI/CD improved with Buildx caching, protected branches, and PAT for semantic-release
✅ Docker build hardened with retry logic and compressed layer caching
✅ Multiple PR templates + branch rules enforced in GitHub Actions
✅ OpenAI-powered LLM reporting integrated into CI pipeline
✅ Full Code Quality & Static Analysis (Black, Flake8, Isort, MyPy, Bandit)
✅ Pre-commit hooks enabled to catch issues before commits

🔍 Project Overview

AI MLOps Platform for behavioral purchase prediction, equipped with:

🚚 Data ingestion (local + AWS S3)
🔀 Spark-based feature engineering (time, recency, behavioral signals)
⚖️ Class imbalance handling (SMOTETomek, dynamic class weights)
🧠 Dual model engines: Scikit-learn (dev) & Spark MLlib (prod)
🔍 GridSearch & CrossValidator support
📈 MLflow experiment tracking + threshold-based metrics
📦 Docker-ready + CI/CD (GitHub Actions)
🤖 OpenAI API integration for automated AI reporting, with dynamic prompts and structured analysis of model results and business KPIs.

📃 Semantic Release

Uses @semantic-release to automatically version and tag builds from main
Integrated PAT token to bypass protection rules
Full changelog and release notes updated via plugin

Command for dry-run:

npx semantic-release --dry-run

🔢 GitHub Workflows Summary

Main CI/CD file: .github/workflows/ci-cd.yml

Includes jobs:

build-and-test: linting, type-checking, static analysis, unit tests, and pipeline logic
docker-build-and-push: buildx + GHCR
model-smoke-test: simple inference test
semantic-release: versioning and release

Protected branches supported via PAT: secrets.PAT_GITHUB

🔹 Run Full Project (DEV & PROD)

# Create and activate environment
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# DEVELOPMENT
make ingest-dev-local
make process-dev-local
make train-dev-local
make test-dev

# PRODUCTION
make ingest-prod-local
make process-prod-local
make train-prod-local
make test-prod

# Launch MLflow UI
make mlflow-local

🔢 Model Pipelines

Mode	Engine	Pipeline Summary
dev	Scikit-learn	SMOTETomek + RF + Calibration + GridSearchCV
prod	Spark MLlib	GBTClassifier + VectorAssembler + CrossValidator

Dispatcher:

python models/train.py --env dev
python models/train.py --env prod

🏗️ Project Structure

ai-mlops-project/
├── configs/           # Environment YAMLs, training parameters, MLflow config
├── data/              # Datasets (raw, intermediate, processed) 
├── data_ingestion/    # Scripts for data download, loading, and storage
├── data_processing/   # Spark-based feature engineering and preprocessing pipelines
├── docs/              # Markdown documentation, diagrams, architecture notes
├── generative_ai/     # LLM integrations, prompts, embedding pipelines, NLP tools
├── mlruns/            # MLflow experiment logs and artifacts (auto-generated)
├── models/            # Sklearn & Spark training pipelines, model classes, training logic
├── scripts/           # CLI scripts for training, evaluation, deployment
├── tests/             # Pytest-based unit and integration tests
├── Makefile           # Full pipeline automation: make train, make deploy, make test, etc.
└── .github/           # CI/CD workflows, CODEOWNERS, and GitHub Actions

📘️ Documentation

🔄 CI/CD Pipeline

🧪 Unit tests and validations (local + Docker-based)
📦 Model artifact verification
🔀 MLflow logging consistency checks
🐳 Docker image with Buildx caching
🌐 Semantic Release with PAT (protected branch support)
📌 PR Templates and branch rules enforced
🤖 OpenAI-integrated reporting (Generative AI)
✅ Code linting, static analysis & type-checking enforced via CI & pre-commit

🔮 Roadmap

⏳ Optuna optimization support
☁️ Cloud deployment (EMR, SageMaker)
🌐 MLflow + FastAPI model serving
🤖 LLM-powered monitoring

Maintainer: Erick Jara — CTO & AI/Data Engineer
📧 erick.jara@hotmail.it | 🌐 GitHub: Impesud

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
.github		.github
configs		configs
data		data
data_ingestion		data_ingestion
data_processing		data_processing
docs		docs
generative_ai		generative_ai
mlops		mlops
mlruns		mlruns
models		models
scripts		scripts
tests		tests
utils		utils
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
mlflow.db		mlflow.db
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI MLOps Project

✨ Latest Updates (June 2025)

🔍 Project Overview

📃 Semantic Release

🔢 GitHub Workflows Summary

🔹 Run Full Project (DEV & PROD)

🔢 Model Pipelines

🏗️ Project Structure

📘️ Documentation

🔄 CI/CD Pipeline

🔮 Roadmap

About

Uh oh!

Releases 2

Uh oh!

Contributors 2

Uh oh!

Languages

License

Impesud/ai-mlops-project

Folders and files

Latest commit

History

Repository files navigation

AI MLOps Project

✨ Latest Updates (June 2025)

🔍 Project Overview

📃 Semantic Release

🔢 GitHub Workflows Summary

🔹 Run Full Project (DEV & PROD)

🔢 Model Pipelines

🏗️ Project Structure

📘️ Documentation

🔄 CI/CD Pipeline

🔮 Roadmap

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors 2

Uh oh!

Languages