Skip to content

ECHO is a multimodal AI system designed to enhance the accessibility of online podcasts for the hearing impaired. It identifies active speakers, synchronizes subtitles, and provides Hindi translations, making video content more inclusive.

Notifications You must be signed in to change notification settings

gouthamireddy2507/ECHO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ECHO: Enhanced Communication for Hearing Impaired in Online Podcasts

ECHO is a groundbreaking system designed to make online podcasts accessible for hearing-impaired individuals. This project identifies and visualizes the active speaker 🗣️, synchronizes subtitles 📝, and even provides Hindi translations 🇮🇳 for enhanced inclusivity.

🖋️ Presented at:
ACM 8th International Conference on Data Science and Management of Data (CODS COMAD)
📅 Date: December 18–21, 2024
📍 Location: IIT Jodhpur, India


✨ Features

  • 🗣️ Speaker Identification: Detects the active speaker in a video using lip movement and audio analysis.
  • 📝 Subtitle Synchronization: Automatically generates synchronized subtitles.
  • 🎥 Multimodal Integration: Combines video (face and lip movement detection) with audio (speech transcription and speaker classification).
  • 🌐 Hindi Translations: Automatically translates English subtitles into Hindi for better accessibility.
  • 📂 Benchmark Dataset: Includes 500 annotated videos, diverse in accents and genres, tailored for speaker identification.

🛠️ Architecture Overview

ECHO integrates state-of-the-art models for seamless functionality:

📹 Video Processing

  • Face Detection: MTCNN detects faces with high accuracy.
  • Lip Movement Detection: LipNet analyzes lip movements for speaker identification.

🎙️ Audio Processing

  • Speech Transcription: Powered by OpenAI’s Whisper.
  • Speaker Embeddings: Extracted with Wav2Vec 2.0.
  • Clustering: Groups speaker data for robust classification.

🔗 Integration

  • Syncs video and audio streams in real-time for seamless output.
  • Bounding boxes and color-coded speakers make the experience intuitive.

Model Overview
Figure: Block diagram of the ECHO architecture.


📊 Evaluation

The system excels across multiple benchmarks:

  • 🏆 Word Error Rate (WER): 5.3% – delivering accurate transcriptions.
  • 🏆 Speaker Error Rate (SER): 9.2% – ensuring precise speaker classification.

🔍 Ablation Studies

  • Integration of audio and video models improves accuracy by 8%.
  • Incorporating lip detection reduces synchronization errors by 4%.

📂 Dataset

The dataset includes:


🚀 Usage

⚙️ Prerequisites

  • Python 3.8 or higher
  • Required libraries: torch, transformers, opencv-python, librosa, scikit-learn

🔧 Installation

  1. Clone the repository:
    git clone https://github.com/your-repo/echo.git
    cd echo
  2. Install dependencies:
    pip install -r requirements.txt

▶️ Running the Model

  1. Place input videos in the data/input directory.
  2. Process videos
    python main.py --input data/input --output data/output
  3. Output videos with subtitles will be saved in the data/output directory. 🎉

🏅 Results

The system supports multiple video genres:

  • 🎙️ Talk shows: WER 9.0%, SER 10.3%
  • 🎤 Interviews: WER 8.5%, SER 9.5%
  • 🗳️ Political debates: WER 5.1%, SER 9.0%

📜 Citation

If you use this work in your research, please cite:

@article{godhala2025echo,
  title={ECHO: enhanced communication for hearing impaired in online video podcasts},
  author={Godhala, Gouthami and Asam, Vijayasree and Sanyal, Samriddha},
  journal={Discover Data},
  volume={3},
  number={1},
  pages={32},
  year={2025},
  publisher={Springer}
}

🙏 Acknowledgments

This project was supported by the Centre for Interdisciplinary Artificial Intelligence (CAI), FLAME University.

👥 Contributors

About

ECHO is a multimodal AI system designed to enhance the accessibility of online podcasts for the hearing impaired. It identifies active speakers, synchronizes subtitles, and provides Hindi translations, making video content more inclusive.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published