Skip to content

A comprehensive data science and machine learning platform designed to identify, analyze, and predict health risks associated with food product adulteration. This system combines state-of-the-art machine learning techniques with a modern web API to provide real-time analysis and prediction capabilities.

License

Notifications You must be signed in to change notification settings

ALX-SE-Algorithmia/food-risk-detection-ml

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

πŸ§ͺ Food Adulteration Detection System πŸ§ͺ

FastAPI Python Docker scikit-learn TensorFlow License

An end-to-end machine learning system for detecting and analyzing food adulteration risks

πŸ” Project Overview

The Food Adulteration Detection System is a comprehensive data science and machine learning platform designed to identify, analyze, and predict health risks associated with food product adulteration. This system combines state-of-the-art machine learning techniques with a modern web API to provide real-time analysis and prediction capabilities.

The project implements a complete MLOps workflow including data ingestion, validation, transformation, model training, evaluation, and deployment - all wrapped in a user-friendly FastAPI interface for seamless interaction.

✨ Key Features

  • πŸ€– Advanced ML Models: Utilizes ensemble learning techniques to predict adulteration risks
  • πŸ”„ End-to-End Pipelines: Automated workflows from data ingestion to prediction
  • 🌐 Modern API: FastAPI-powered interface with automatic OpenAPI documentation
  • πŸ“Š Interactive UI: Clean, responsive web interface for submitting samples and viewing results
  • 🐳 Containerization: Docker support for consistent deployment across environments
  • πŸ§ͺ Comprehensive Testing: Pytest-based test suite with coverage reporting
  • πŸ“ Detailed Logging: Structured logging for monitoring and debugging

πŸ—οΈ Project Architecture

graph TD
    A[Data Sources] --> B[Data Ingestion]
    B --> C[Data Validation]
    C --> D[Data Transformation]
    D --> E[Model Training]
    E --> F[Model Evaluation]
    F --> G[Model Deployment]
    G --> H[FastAPI Service]
    H --> I[User Interface]
Loading

πŸ“‚ Project Structure

.
β”œβ”€β”€ πŸ“ .github/                      # GitHub Actions workflows
β”œβ”€β”€ πŸ“ .git/                         # Git repository
β”œβ”€β”€ πŸ“ artifacts/                    # Generated model artifacts
β”‚   β”œβ”€β”€ πŸ“ data_transformation/      # Preprocessors and encoders
β”‚   └── πŸ“ model_trainer/            # Trained models
β”œβ”€β”€ πŸ“ config/                       # Configuration files
β”‚   └── πŸ“„ config.yaml               # Main configuration
β”œβ”€β”€ πŸ“ research/                     # Research notebooks and experiments
β”‚   β”œβ”€β”€ πŸ“„ data_exploration.ipynb
β”‚   β”œβ”€β”€ πŸ“„ model_experimentation.ipynb
β”‚   └── πŸ“„ feature_engineering.ipynb
β”œβ”€β”€ πŸ“ src/                          # Source code
β”‚   └── πŸ“ datascience/             # Main package
β”‚       β”œβ”€β”€ πŸ“ components/           # Core ML components
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_ingestion.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_validation.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_transformation.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ model_trainer.py
β”‚       β”‚   └── πŸ“„ model_evaluation.py
β”‚       β”œβ”€β”€ πŸ“ config/               # Configuration handlers
β”‚       β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”‚   └── πŸ“„ configuration.py
β”‚       β”œβ”€β”€ πŸ“ constants/            # Constants and defaults
β”‚       β”‚   └── πŸ“„ __init__.py
β”‚       β”œβ”€β”€ πŸ“ entity/               # Data entities
β”‚       β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”‚   └── πŸ“„ config_entity.py
β”‚       β”œβ”€β”€ πŸ“ pipeline/             # Pipeline orchestration
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_ingestion_pipeline.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_validation_pipeline.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ data_transformation_pipeline.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ model_trainer_pipeline.py
β”‚       β”‚   β”œβ”€β”€ πŸ“„ model_evaluation_pipeline.py
β”‚       β”‚   └── πŸ“„ prediction_pipeline.py
β”‚       β”œβ”€β”€ πŸ“ utils/                # Utility functions
β”‚       β”‚   β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”‚   └── πŸ“„ common.py
β”‚       β”œβ”€β”€ πŸ“„ __init__.py
β”‚       β”œβ”€β”€ πŸ“„ exception.py          # Custom exception handling
β”‚       └── πŸ“„ logger.py             # Logging configuration
β”œβ”€β”€ πŸ“ static/                       # Static assets for web UI
β”‚   β”œβ”€β”€ πŸ“ css/
β”‚   β”œβ”€β”€ πŸ“ js/
β”‚   └── πŸ“ images/
β”œβ”€β”€ πŸ“ templates/                    # HTML templates
β”‚   β”œβ”€β”€ πŸ“„ index.html               # Main UI
β”‚   β”œβ”€β”€ πŸ“„ results.html             # Results display
β”‚   └── πŸ“„ error.html               # Error page
β”œβ”€β”€ πŸ“ tests/                        # Test suite
β”‚   β”œβ”€β”€ πŸ“„ test_app.py              # API tests
β”‚   └── πŸ“„ test_components.py       # Component tests
β”œβ”€β”€ πŸ“„ .gitignore                    # Git ignore file
β”œβ”€β”€ πŸ“„ app.py                        # FastAPI application
β”œβ”€β”€ πŸ“„ Dockerfile                    # Docker configuration
β”œβ”€β”€ πŸ“„ LICENSE                       # License information
β”œβ”€β”€ πŸ“„ main.py                       # Application entry point
β”œβ”€β”€ πŸ“„ metrics.json                  # Model metrics
β”œβ”€β”€ πŸ“„ params.yaml                   # Model parameters
β”œβ”€β”€ πŸ“„ pyproject.toml                # Python project configuration
β”œβ”€β”€ πŸ“„ pytest.ini                    # Pytest configuration
β”œβ”€β”€ πŸ“„ README.md                     # Project documentation
β”œβ”€β”€ πŸ“„ requirements.txt              # Dependencies
β”œβ”€β”€ πŸ“„ schema.yaml                   # Data schema definition
β”œβ”€β”€ πŸ“„ setup.py                      # Package setup script
└── πŸ“„ template.py                   # Template generator

πŸš€ Getting Started

Prerequisites

  • Python 3.8 or later
  • Docker (optional for containerization)
  • Git

Installation

Local Development

  1. Clone the repository

    git clone https://github.com/austinLorenzMccoy/CompleteDSproject.git
    cd CompleteDSproject
  2. Create and activate a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Run the application

    uvicorn app:app --host 0.0.0.0 --port 8081 --reload
  5. Access the application

Using Docker

  1. Build the Docker image

    docker build -t food-adulteration-detection .
  2. Run the Docker container

    docker run -p 8081:8081 food-adulteration-detection
  3. Access the application

πŸ§ͺ Testing

Run the test suite with pytest:

python -m pytest tests/ -v

For test coverage:

python -m pytest tests/ --cov=src --cov-report=term-missing

πŸ“Š API Endpoints

Endpoint Method Description
/ GET Web UI for the application
/predict POST API endpoint for prediction (JSON)
/predict-form POST Form submission endpoint
/health GET Health check endpoint
/docs GET Interactive API documentation

πŸ”„ ML Pipeline Components

Data Ingestion

Handles the collection, import, and initial storage of raw data from various sources.

Data Validation

Verifies data quality, schema compliance, and performs drift detection.

Data Transformation

Preprocesses data through cleaning, feature engineering, scaling, and encoding.

Model Training

Trains machine learning models using optimized hyperparameters.

Model Evaluation

Assesses model performance using various metrics and validation techniques.

Prediction Pipeline

Orchestrates the end-to-end process from data input to prediction output.

πŸ“ˆ MLflow Experiment Tracking

This project uses MLflow for experiment tracking and model management. MLflow helps track experiments, compare model versions, and manage the model lifecycle.

MLflow Configuration

The project is configured to use DAGsHub as the MLflow tracking server:

os.environ["MLFLOW_TRACKING_URI"] = "https://dagshub.com/austinLorenzMccoy/CompleteDSproject.mlflow"
os.environ["MLFLOW_TRACKING_USERNAME"] = "austinLorenzMccoy"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "1d06b3f1dc94bb2bb3ed0960c7d406847b9d362d"

Accessing MLflow Tracking

There are two ways to access MLflow tracking information:

  1. DAGsHub MLflow Dashboard: Access experiments directly through the DAGsHub website at:

    https://dagshub.com/austinLorenzMccoy/CompleteDSproject.mlflow
    

    You'll need to log in with your DAGsHub credentials.

  2. Python Script: Use the provided Python script to view experiments programmatically:

    python view_mlflow_experiments.py

Tracked Metrics

The following metrics are tracked for each model:

  • Accuracy
  • Precision
  • Recall
  • F1 Score

Model Parameters

Key model parameters tracked in MLflow:

  • layer1_units: Number of units in the first layer
  • layer2_units: Number of units in the second layer
  • learning_rate: Learning rate for model training

🐳 Docker Configuration

The project includes a comprehensive Dockerfile that:

  • Uses Python 3.9 as the base image
  • Sets up a secure non-root user
  • Configures health checks
  • Optimizes layer caching
  • Implements best practices for Python containerization

πŸ‘₯ Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“œ License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

πŸ“ž Contact


🌟 Star this repository if you find it useful! 🌟

Key Features and Modules

1. Components

  • Data Ingestion (data_ingestion.py): Manages data collection and storage.
  • Data Transformation (data_transformation.py): Converts raw data into a format suitable for analysis and modeling.
  • Model Trainer (model_trainer.py): Handles the training of machine learning models.

2. Pipeline

  • Training Pipeline (training_pipeline.py): Orchestrates the training process end-to-end.
  • Prediction Pipeline (prediction_pipeline.py): Handles prediction tasks on new data.

3. Utilities

  • Common Utilities (common.py): Includes helper functions used throughout the project.

4. Configuration

  • Config Module (configuration.py): Loads and parses configuration settings from config.yaml.
  • Params (params.yaml): Stores hyperparameters for model training.
  • Schema (schema.yaml): Defines the data structure and constraints.

5. Logging and Exception Handling

  • Logger (logger.py): Centralized logging for debugging and monitoring.
  • Custom Exceptions (exception.py): Handles errors in a standardized manner.

6. Entity and Constants

  • Entity (config_entity.py): Defines structured configurations such as data paths, model settings, and more to ensure clean and maintainable code.
  • Constants (constants/__init__.py): Stores immutable values like file paths and default parameters, reducing hardcoding and simplifying updates.

7. Docker Support

  • Dockerfile: Ensures a consistent runtime environment for deployment.

8. Research and Development

  • Research Notebook (research.ipynb): Used for exploratory data analysis and prototyping.

9. Web Interface

  • HTML Template (index.html): Provides a simple user interface for interacting with the project.

How to Run the Project

Prerequisites

  • Python 3.8 or later
  • Docker (optional for containerization)

Installation

  1. Clone the repository:

    git clone https://github.com/austinLorenzMccoy/CompleteDSproject.git
    cd CompleteDSproject
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up the project:

    python setup.py install

Running the Project

  1. To execute the training pipeline:

    python src/datascience/pipeline/training_pipeline.py
  2. To run the prediction pipeline:

    python src/datascience/pipeline/prediction_pipeline.py

Using Docker

  1. Build the Docker image:

    docker build -t datascience-project .
  2. Run the Docker container:

    docker run -it datascience-project

Contribution

We welcome contributions! Feel free to fork this repository, make improvements, and submit a pull request.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Contact

For inquiries or feedback, please contact:

About

A comprehensive data science and machine learning platform designed to identify, analyze, and predict health risks associated with food product adulteration. This system combines state-of-the-art machine learning techniques with a modern web API to provide real-time analysis and prediction capabilities.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 51.8%
  • Python 40.3%
  • HTML 7.0%
  • Dockerfile 0.9%