An end-to-end machine learning system for detecting and analyzing food adulteration risks
The Food Adulteration Detection System is a comprehensive data science and machine learning platform designed to identify, analyze, and predict health risks associated with food product adulteration. This system combines state-of-the-art machine learning techniques with a modern web API to provide real-time analysis and prediction capabilities.
The project implements a complete MLOps workflow including data ingestion, validation, transformation, model training, evaluation, and deployment - all wrapped in a user-friendly FastAPI interface for seamless interaction.
- π€ Advanced ML Models: Utilizes ensemble learning techniques to predict adulteration risks
- π End-to-End Pipelines: Automated workflows from data ingestion to prediction
- π Modern API: FastAPI-powered interface with automatic OpenAPI documentation
- π Interactive UI: Clean, responsive web interface for submitting samples and viewing results
- π³ Containerization: Docker support for consistent deployment across environments
- π§ͺ Comprehensive Testing: Pytest-based test suite with coverage reporting
- π Detailed Logging: Structured logging for monitoring and debugging
graph TD
A[Data Sources] --> B[Data Ingestion]
B --> C[Data Validation]
C --> D[Data Transformation]
D --> E[Model Training]
E --> F[Model Evaluation]
F --> G[Model Deployment]
G --> H[FastAPI Service]
H --> I[User Interface]
.
βββ π .github/ # GitHub Actions workflows
βββ π .git/ # Git repository
βββ π artifacts/ # Generated model artifacts
β βββ π data_transformation/ # Preprocessors and encoders
β βββ π model_trainer/ # Trained models
βββ π config/ # Configuration files
β βββ π config.yaml # Main configuration
βββ π research/ # Research notebooks and experiments
β βββ π data_exploration.ipynb
β βββ π model_experimentation.ipynb
β βββ π feature_engineering.ipynb
βββ π src/ # Source code
β βββ π datascience/ # Main package
β βββ π components/ # Core ML components
β β βββ π data_ingestion.py
β β βββ π data_validation.py
β β βββ π data_transformation.py
β β βββ π model_trainer.py
β β βββ π model_evaluation.py
β βββ π config/ # Configuration handlers
β β βββ π __init__.py
β β βββ π configuration.py
β βββ π constants/ # Constants and defaults
β β βββ π __init__.py
β βββ π entity/ # Data entities
β β βββ π __init__.py
β β βββ π config_entity.py
β βββ π pipeline/ # Pipeline orchestration
β β βββ π data_ingestion_pipeline.py
β β βββ π data_validation_pipeline.py
β β βββ π data_transformation_pipeline.py
β β βββ π model_trainer_pipeline.py
β β βββ π model_evaluation_pipeline.py
β β βββ π prediction_pipeline.py
β βββ π utils/ # Utility functions
β β βββ π __init__.py
β β βββ π common.py
β βββ π __init__.py
β βββ π exception.py # Custom exception handling
β βββ π logger.py # Logging configuration
βββ π static/ # Static assets for web UI
β βββ π css/
β βββ π js/
β βββ π images/
βββ π templates/ # HTML templates
β βββ π index.html # Main UI
β βββ π results.html # Results display
β βββ π error.html # Error page
βββ π tests/ # Test suite
β βββ π test_app.py # API tests
β βββ π test_components.py # Component tests
βββ π .gitignore # Git ignore file
βββ π app.py # FastAPI application
βββ π Dockerfile # Docker configuration
βββ π LICENSE # License information
βββ π main.py # Application entry point
βββ π metrics.json # Model metrics
βββ π params.yaml # Model parameters
βββ π pyproject.toml # Python project configuration
βββ π pytest.ini # Pytest configuration
βββ π README.md # Project documentation
βββ π requirements.txt # Dependencies
βββ π schema.yaml # Data schema definition
βββ π setup.py # Package setup script
βββ π template.py # Template generator
- Python 3.8 or later
- Docker (optional for containerization)
- Git
-
Clone the repository
git clone https://github.com/austinLorenzMccoy/CompleteDSproject.git cd CompleteDSproject
-
Create and activate a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
uvicorn app:app --host 0.0.0.0 --port 8081 --reload
-
Access the application
- Web UI: http://localhost:8081
- API Documentation: http://localhost:8081/docs
- Health Check: http://localhost:8081/health
-
Build the Docker image
docker build -t food-adulteration-detection .
-
Run the Docker container
docker run -p 8081:8081 food-adulteration-detection
-
Access the application
- Web UI: http://localhost:8081
- API Documentation: http://localhost:8081/docs
Run the test suite with pytest:
python -m pytest tests/ -v
For test coverage:
python -m pytest tests/ --cov=src --cov-report=term-missing
Endpoint | Method | Description |
---|---|---|
/ |
GET | Web UI for the application |
/predict |
POST | API endpoint for prediction (JSON) |
/predict-form |
POST | Form submission endpoint |
/health |
GET | Health check endpoint |
/docs |
GET | Interactive API documentation |
Handles the collection, import, and initial storage of raw data from various sources.
Verifies data quality, schema compliance, and performs drift detection.
Preprocesses data through cleaning, feature engineering, scaling, and encoding.
Trains machine learning models using optimized hyperparameters.
Assesses model performance using various metrics and validation techniques.
Orchestrates the end-to-end process from data input to prediction output.
This project uses MLflow for experiment tracking and model management. MLflow helps track experiments, compare model versions, and manage the model lifecycle.
The project is configured to use DAGsHub as the MLflow tracking server:
os.environ["MLFLOW_TRACKING_URI"] = "https://dagshub.com/austinLorenzMccoy/CompleteDSproject.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "austinLorenzMccoy"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "1d06b3f1dc94bb2bb3ed0960c7d406847b9d362d"
There are two ways to access MLflow tracking information:
-
DAGsHub MLflow Dashboard: Access experiments directly through the DAGsHub website at:
https://dagshub.com/austinLorenzMccoy/CompleteDSproject.mlflow
You'll need to log in with your DAGsHub credentials.
-
Python Script: Use the provided Python script to view experiments programmatically:
python view_mlflow_experiments.py
The following metrics are tracked for each model:
- Accuracy
- Precision
- Recall
- F1 Score
Key model parameters tracked in MLflow:
- layer1_units: Number of units in the first layer
- layer2_units: Number of units in the second layer
- learning_rate: Learning rate for model training
The project includes a comprehensive Dockerfile that:
- Uses Python 3.9 as the base image
- Sets up a secure non-root user
- Configures health checks
- Optimizes layer caching
- Implements best practices for Python containerization
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
- Name: Chibueze Augustine Chidera
- Email: chibuezeaugustine23@gmail.com
- GitHub: austinLorenzMccoy
- Data Ingestion (
data_ingestion.py
): Manages data collection and storage. - Data Transformation (
data_transformation.py
): Converts raw data into a format suitable for analysis and modeling. - Model Trainer (
model_trainer.py
): Handles the training of machine learning models.
- Training Pipeline (
training_pipeline.py
): Orchestrates the training process end-to-end. - Prediction Pipeline (
prediction_pipeline.py
): Handles prediction tasks on new data.
- Common Utilities (
common.py
): Includes helper functions used throughout the project.
- Config Module (
configuration.py
): Loads and parses configuration settings fromconfig.yaml
. - Params (
params.yaml
): Stores hyperparameters for model training. - Schema (
schema.yaml
): Defines the data structure and constraints.
- Logger (
logger.py
): Centralized logging for debugging and monitoring. - Custom Exceptions (
exception.py
): Handles errors in a standardized manner.
- Entity (
config_entity.py
): Defines structured configurations such as data paths, model settings, and more to ensure clean and maintainable code. - Constants (
constants/__init__.py
): Stores immutable values like file paths and default parameters, reducing hardcoding and simplifying updates.
- Dockerfile: Ensures a consistent runtime environment for deployment.
- Research Notebook (
research.ipynb
): Used for exploratory data analysis and prototyping.
- HTML Template (
index.html
): Provides a simple user interface for interacting with the project.
- Python 3.8 or later
- Docker (optional for containerization)
-
Clone the repository:
git clone https://github.com/austinLorenzMccoy/CompleteDSproject.git cd CompleteDSproject
-
Install dependencies:
pip install -r requirements.txt
-
Set up the project:
python setup.py install
-
To execute the training pipeline:
python src/datascience/pipeline/training_pipeline.py
-
To run the prediction pipeline:
python src/datascience/pipeline/prediction_pipeline.py
-
Build the Docker image:
docker build -t datascience-project .
-
Run the Docker container:
docker run -it datascience-project
We welcome contributions! Feel free to fork this repository, make improvements, and submit a pull request.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
For inquiries or feedback, please contact:
- Name: Chibueze Augustine Chidera
- Email: chibuezeaugustine23@gmail.com
- GitHub: austinLorenzMccoy