

A comprehensive project for prompt-guided food segmentation using state-of-the-art pre-trained models. This project combines GroundingDINO for object detection and MobileSAM for precise segmentation, providing both a Google Colab Notebook for experimentation and a Flask web application for easy deployment.
Before diving into the implementation, we strongly recommend reading through the research papers in the research/
folder to understand the theoretical foundations and capabilities of the pre-trained models used in this project:
- GroundingDINO Research: Understanding prompt-guided object detection
- Guided Diffusion Model for Adversarial Purification: Advanced model techniques
- Image Segmentation Using Text and Image Prompts: Core segmentation concepts
These papers provide valuable insights into how the models work, their limitations, and best practices for optimal results.
- Prompt-guided segmentation: Upload an image and provide a text prompt to segment specific food items
- Real-time processing: Fast inference using pre-trained models
- Multiple interfaces:
- Google Colab notebook for experimentation and analysis
- Flask web application for easy deployment
- User-friendly interface: Clean, responsive web interface
- Transparent background: Segmented objects are saved with transparent backgrounds
- Comprehensive testing: Built-in test suite to verify functionality
- Multiple model support: Includes both MobileSAM and MobileSAMv2 for enhanced performance
- Automatic model download: Models are automatically downloaded if not present
- Error handling: Robust error handling and validation for various edge cases
- Health check endpoint: Built-in health monitoring for production deployment
Food-segmentation-with-pre-trained-model/
βββ Food_Segmentation.ipynb # Google Colab notebook for experimentation
βββ webapp/ # Flask web application
β βββ app.py # Main Flask application with segmentation logic
β βββ model_loader.py # Model loading and initialization with auto-download
β βββ test_app.py # Test script for validation
β βββ requirements.txt # Python dependencies
β βββ static/ # Static files (generated images)
β β βββ images/ # Uploaded and processed images
β β βββ GeneratedImages/ # Segmentation results with transparent backgrounds
β βββ GroundingDINO/ # GroundingDINO model files
β βββ MobileSAM/ # MobileSAM and MobileSAMv2 model files
β βββ MobileSAMv2/ # Enhanced MobileSAMv2 implementation
β βββ weights/ # Model weights
βββ images/ # Sample food images for testing (40+ images)
βββ Results/ # Segmentation results and analysis
β βββ accurateresults/ # Successful segmentation results
β βββ inaccuracies/ # Failed segmentation cases
β βββ result.json # Detailed results data (7,000+ lines)
βββ readme.md # This file
Make sure you have the following installed:
- Python 3.7+
- PyTorch
- OpenCV
- Flask
- Google Colab (for notebook experimentation)
- Other dependencies listed in
webapp/requirements.txt
-
Open the Jupyter notebook in Google Colab:
- Click the "Open in Colab" button in the notebook
- Or manually upload
Food_Segmentation.ipynb
to Google Colab
-
The notebook will automatically:
- Set up the environment
- Clone the required model repositories
- Install all dependencies
- Download model weights
- Load the models for experimentation
-
Run the cells sequentially to perform food segmentation experiments
- Navigate to the webapp directory:
cd webapp
- Install dependencies:
pip install -r requirements.txt
- The application will automatically download model files if they don't exist:
- GroundingDINO checkpoint:
GroundingDINO/groundingdino_swint_ogc.pth
- MobileSAM checkpoint:
MobileSAM/weights/mobile_sam.pt
- GroundingDINO checkpoint:
- Start the web application:
cd webapp
python app.py
2.Click on any of the local host domains available to open in your default web browser:
* http://127.0.0.1:5001
* http://192.168.0.181:5001
-
Upload an image and enter a prompt describing the food item you want to segment (e.g., "Banku", "Jollof Rice", "Tomato Stew")
-
Click "Segment Food" to process the image
-
View the results showing both the original image and the segmented object
- Open
Food_Segmentation.ipynb
in Google Colab - Run the cells sequentially to:
- Set up the environment (automatic in Colab)
- Clone and install model repositories
- Download and load models
- Perform segmentation on sample images
- Analyze results
GET /
: Main web interfacePOST /segment
: Process image segmentation (expects multipart form data withimage_file
andprompt
)GET /health
: Health check endpoint for monitoringGET /static/<filename>
: Serve static files (images)GET /static/images/<filename>
: Serve processed images
- GroundingDINO: Used for object detection based on text prompts
- Repository: https://github.com/IDEA-Research/GroundingDINO
- Detects objects in images based on natural language descriptions
- Automatically downloaded if not present
- MobileSAM: Used for precise segmentation of detected objects
- Repository: https://github.com/ChaoningZhang/MobileSAM
- Lightweight version of SAM (Segment Anything Model) for mobile deployment
- MobileSAMv2: Enhanced version with object-aware prompt sampling
- Available in the MobileSAM directory
- Faster segmentation with improved accuracy
- Both models run on CPU by default (GPU support available if CUDA is installed)
Run the test script to verify everything is working:
cd webapp
python test_app.py
- Use the sample images in the
images/
directory (40+ food images available) - Try different prompts to test segmentation accuracy
- Check the
Results/
directory for example outputs
The project includes comprehensive testing results:
- Sample Results: Check
Results/accurateresults/
for successful segmentations - Analysis: Review
Results/inaccuracies/
for cases where segmentation failed - Data: Detailed results in
Results/result.json
(7,000+ lines of analysis data) - Generated Images: Processed images in
webapp/static/GeneratedImages/
- Successfully tested on 40+ food images
- Supports various food types: burgers, pizza, fruits, vegetables, etc.
- Real-time processing with automatic error handling
- Import errors: Make sure all dependencies are installed
- Model loading errors: Models are automatically downloaded if missing
- CUDA errors: The app defaults to CPU mode. For GPU acceleration, ensure CUDA is properly installed
- Memory issues: Large images may require more RAM. Consider resizing images if needed
- Health check: Use the
/health
endpoint to verify application status
Key dependencies include:
torch>=1.9.0
torchvision>=0.10.0
supervision>=0.3.0
opencv-python>=4.5.0
numpy>=1.21.0
flask>=2.0.0
transformers>=4.20.0
ultralytics>=8.0.0
gradio>=3.0.0
streamlit>=1.20.0
For a complete list, see webapp/requirements.txt
.
- Enhanced Model Support: Added MobileSAMv2 for improved segmentation performance
- Automatic Model Download: Models are automatically downloaded if not present
- Improved Error Handling: Better validation and error messages
- Health Monitoring: Added health check endpoint for production deployment
- Transparent Background: Segmented objects are saved with transparent backgrounds
- Comprehensive Testing: Extensive testing on 40+ food images with detailed results
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project uses pre-trained models from:
- GroundingDINO: https://github.com/IDEA-Research/GroundingDINO
- MobileSAM: https://github.com/ChaoningZhang/MobileSAM
Please refer to their respective licenses for model usage terms.
β Completed Features:
- GroundingDINO integration for object detection
- MobileSAM integration for segmentation
- Flask web application with user interface
- Google Colab for experimentation
- Automatic model downloading
- Error handling and validation
- Health check endpoint
- Comprehensive testing suite
- MobileSAMv2 support
- Enhanced UI/UX improvements
π In Progress:
- Performance optimization for large images
- Additional model fine-tuning options
- Food Nutritional Content Analysis
Note: This project is designed for experimental purposes. The models are pre-trained and may not work perfectly on all types of food images. For production use, consider fine-tuning the models on your specific dataset.