Skip to content

A Deep-Learning based web app that generates image captions using a pre-trained CNN-LSTM model. Upload your own image or use sample ones to see AI describe them in natural language. Built with TensorFlow, trained on Flickr8k, and combines computer vision with NLP.

License

Notifications You must be signed in to change notification settings

NDDimension/Image-Caption-Generator-using-CNN-LSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🖼️ Image Caption Generator

Automatically generate captions for images using a deep learning model.

🔗 Live App: https://image-caption-generator-using-cnn-lstm-nddimension.streamlit.app/

📓 Notebook: https://www.kaggle.com/code/nddimension/image-captioning-using-cnn-rnn

🚀 Model : https://drive.google.com/file/d/1d-qOyZaU34_N-cxEDtFPG9iApCbLrlmu/view?usp=drive_link

🗣️ Dataset : https://drive.google.com/file/d/1QNCjQCsQBoxlMyc9WFM_fg2LU5NEh5iJ/view?usp=drive_link


🎯 Project Overview

Image Caption Generator is an AI-powered web app that generates natural language descriptions for images using a deep learning model. It combines a CNN for image feature extraction and an LSTM decoder to produce coherent captions.

📷 Upload or select a sample image 🧠 AI generates descriptive captions 🗣️ Powered by a pre-trained CNN-LSTM model 🚀 Interactive and educational experience

✅ Pre-trained models loaded automatically ✅ Sample images included for quick testing ✅ Supports image uploads (JPG, PNG, JPEG) ✅ Built with Streamlit for ease of use


🔍 Features

Feature Description
🖼️ Image Upload Upload your own image or select from sample images
🧠 AI Captioning Generate natural-language captions using deep learning
📝 Caption Display Clean, styled caption output with real-time preview
⚙️ Model Caching Speeds up inference using Streamlit caching
📖 Educational Sections Learn how the model architecture works
🔍 Debug Mode Optional debug panel for technical details

📌 Workflow

  1. Load Pre-trained Models

  2. Image Preprocessing

    • Resize, normalize, and format image for the CNN
  3. Feature Extraction

    • CNN extracts image features (e.g., ResNet, Inception)
  4. Caption Generation

    • LSTM decoder predicts words one by one (auto-regressive)
  5. Display Output

    • Caption is cleaned and shown in real time

⚙️ How It Works

  1. Architecture

    • A CNN (e.g., ResNet) is used to extract image features
    • A pre-trained LSTM model takes these features and generates a caption word-by-word
  2. Tokenizer & Sequence

    • A tokenizer encodes/decodes the text data
    • Input sequences are padded to a fixed max length
  3. Inference

    • Starts with the token startseq
    • Predicts next word using softmax
    • Ends at endseq or when max length is reached
  4. Interface

    • Streamlit UI allows users to upload images or choose from samples
    • Captions are generated and displayed on the same page

🎹 App Preview

🧠 Image + Caption

Main


📦 Requirements

Install everything using:

pip install -r requirements.txt

🚀 Getting Started

1️⃣ Clone the repository

git clone https://github.com/NDDimension/Image-Caption-Generator-using-CNN-LSTM.git
cd  image-caption-generator

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Run the Streamlit App

streamlit run app.py

✨ Highlights

✅ Automatic download of pre-trained models and tokenizer ✅ Streamlit-based interactive interface ✅ Works with sample and user-uploaded images ✅ Educational explanations included ✅ Debug mode for inspecting internals

🔮 Future Improvements

🧠 Add beam search for more accurate caption generation 🌐 Deploy to HuggingFace Spaces 📤 Allow batch caption generation 🗂️ Add support for custom training datasets 🎯 Add attention visualization for interpretability


🙌 Credits & Contributors

Notebook Revamped & Curated by: NISHTHA SHARMA

📌 GitHub: https://github.com/711nishtha

📌 Kaggle: https://www.kaggle.com/nishtha711

App and Training by: DHANRAJ SHARMA

📌 GitHub: https://github.com/NDDimension

Inspired by:


📜 License

Licensed under the MIT License.

Image Caption GeneratorAI that sees and speaks. ❤️ Made with love by Dhanraj Sharma.

About

A Deep-Learning based web app that generates image captions using a pre-trained CNN-LSTM model. Upload your own image or use sample ones to see AI describe them in natural language. Built with TensorFlow, trained on Flickr8k, and combines computer vision with NLP.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published