Automatically generate captions for images using a deep learning model.
🔗 Live App: https://image-caption-generator-using-cnn-lstm-nddimension.streamlit.app/
📓 Notebook: https://www.kaggle.com/code/nddimension/image-captioning-using-cnn-rnn
🚀 Model : https://drive.google.com/file/d/1d-qOyZaU34_N-cxEDtFPG9iApCbLrlmu/view?usp=drive_link
🗣️ Dataset : https://drive.google.com/file/d/1QNCjQCsQBoxlMyc9WFM_fg2LU5NEh5iJ/view?usp=drive_link
Image Caption Generator is an AI-powered web app that generates natural language descriptions for images using a deep learning model. It combines a CNN for image feature extraction and an LSTM decoder to produce coherent captions.
📷 Upload or select a sample image 🧠 AI generates descriptive captions 🗣️ Powered by a pre-trained CNN-LSTM model 🚀 Interactive and educational experience
✅ Pre-trained models loaded automatically ✅ Sample images included for quick testing ✅ Supports image uploads (JPG, PNG, JPEG) ✅ Built with Streamlit for ease of use
Feature | Description |
---|---|
🖼️ Image Upload | Upload your own image or select from sample images |
🧠 AI Captioning | Generate natural-language captions using deep learning |
📝 Caption Display | Clean, styled caption output with real-time preview |
⚙️ Model Caching | Speeds up inference using Streamlit caching |
📖 Educational Sections | Learn how the model architecture works |
🔍 Debug Mode | Optional debug panel for technical details |
-
Load Pre-trained Models
-
Image Preprocessing
- Resize, normalize, and format image for the CNN
-
Feature Extraction
- CNN extracts image features (e.g., ResNet, Inception)
-
Caption Generation
- LSTM decoder predicts words one by one (auto-regressive)
-
Display Output
- Caption is cleaned and shown in real time
-
Architecture
- A CNN (e.g., ResNet) is used to extract image features
- A pre-trained LSTM model takes these features and generates a caption word-by-word
-
Tokenizer & Sequence
- A tokenizer encodes/decodes the text data
- Input sequences are padded to a fixed max length
-
Inference
- Starts with the token startseq
- Predicts next word using softmax
- Ends at endseq or when max length is reached
-
Interface
- Streamlit UI allows users to upload images or choose from samples
- Captions are generated and displayed on the same page
Install everything using:
pip install -r requirements.txt
1️⃣ Clone the repository
git clone https://github.com/NDDimension/Image-Caption-Generator-using-CNN-LSTM.git
cd image-caption-generator
2️⃣ Install Dependencies
pip install -r requirements.txt
3️⃣ Run the Streamlit App
streamlit run app.py
✅ Automatic download of pre-trained models and tokenizer ✅ Streamlit-based interactive interface ✅ Works with sample and user-uploaded images ✅ Educational explanations included ✅ Debug mode for inspecting internals
🧠 Add beam search for more accurate caption generation 🌐 Deploy to HuggingFace Spaces 📤 Allow batch caption generation 🗂️ Add support for custom training datasets 🎯 Add attention visualization for interpretability
Notebook Revamped & Curated by: NISHTHA SHARMA
📌 GitHub: https://github.com/711nishtha
📌 Kaggle: https://www.kaggle.com/nishtha711
App and Training by: DHANRAJ SHARMA
📌 GitHub: https://github.com/NDDimension
Inspired by:
- Show and Tell Model (Google)
- Flickr8k Dataset
- TensorFlow & Keras captioning tutorials
- Streamlit open-source community
Licensed under the MIT License.
Image Caption Generator — AI that sees and speaks. ❤️ Made with love by Dhanraj Sharma.