ECHO: Enhanced Communication for Hearing Impaired in Online Podcasts

ECHO is a groundbreaking system designed to make online podcasts accessible for hearing-impaired individuals. This project identifies and visualizes the active speaker 🗣️, synchronizes subtitles 📝, and even provides Hindi translations 🇮🇳 for enhanced inclusivity.

🖋️ Presented at:
ACM 8th International Conference on Data Science and Management of Data (CODS COMAD)
📅 Date: December 18–21, 2024
📍 Location: IIT Jodhpur, India

✨ Features

🗣️ Speaker Identification: Detects the active speaker in a video using lip movement and audio analysis.
📝 Subtitle Synchronization: Automatically generates synchronized subtitles.
🎥 Multimodal Integration: Combines video (face and lip movement detection) with audio (speech transcription and speaker classification).
🌐 Hindi Translations: Automatically translates English subtitles into Hindi for better accessibility.
📂 Benchmark Dataset: Includes 500 annotated videos, diverse in accents and genres, tailored for speaker identification.

🛠️ Architecture Overview

ECHO integrates state-of-the-art models for seamless functionality:

📹 Video Processing

Face Detection: MTCNN detects faces with high accuracy.
Lip Movement Detection: LipNet analyzes lip movements for speaker identification.

🎙️ Audio Processing

Speech Transcription: Powered by OpenAI’s Whisper.
Speaker Embeddings: Extracted with Wav2Vec 2.0.
Clustering: Groups speaker data for robust classification.

🔗 Integration

Syncs video and audio streams in real-time for seamless output.
Bounding boxes and color-coded speakers make the experience intuitive.

Figure: Block diagram of the ECHO architecture.

📊 Evaluation

The system excels across multiple benchmarks:

🏆 Word Error Rate (WER): 5.3% – delivering accurate transcriptions.
🏆 Speaker Error Rate (SER): 9.2% – ensuring precise speaker classification.

🔍 Ablation Studies

Integration of audio and video models improves accuracy by 8%.
Incorporating lip detection reduces synchronization errors by 4%.

📂 Dataset

The dataset includes:

🎥 500 conversation videos, annotated with English subtitles in .srt format.
🌐 Designed for diverse accents and genres.
📥 Download a sample of the dataset here.

🚀 Usage

⚙️ Prerequisites

Python 3.8 or higher
Required libraries: torch, transformers, opencv-python, librosa, scikit-learn

🔧 Installation

Clone the repository:

git clone https://github.com/your-repo/echo.git
cd echo

Install dependencies:
```
pip install -r requirements.txt
```

▶️ Running the Model

Place input videos in the data/input directory.

Process videos

python main.py --input data/input --output data/output

Output videos with subtitles will be saved in the data/output directory. 🎉

🏅 Results

The system supports multiple video genres:

🎙️ Talk shows: WER 9.0%, SER 10.3%
🎤 Interviews: WER 8.5%, SER 9.5%
🗳️ Political debates: WER 5.1%, SER 9.0%

📜 Citation

If you use this work in your research, please cite:

@article{godhala2025echo,
  title={ECHO: enhanced communication for hearing impaired in online video podcasts},
  author={Godhala, Gouthami and Asam, Vijayasree and Sanyal, Samriddha},
  journal={Discover Data},
  volume={3},
  number={1},
  pages={32},
  year={2025},
  publisher={Springer}
}

🙏 Acknowledgments

This project was supported by the Centre for Interdisciplinary Artificial Intelligence (CAI), FLAME University.

👥 Contributors

Gouthami Godhala
Vijayasree Asam
Samriddha Sanyal

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ECHO: Enhanced Communication for Hearing Impaired in Online Podcasts

✨ Features

🛠️ Architecture Overview

📹 Video Processing

🎙️ Audio Processing

🔗 Integration

📊 Evaluation

🔍 Ablation Studies

📂 Dataset

🚀 Usage

⚙️ Prerequisites

🔧 Installation

▶️ Running the Model

🏅 Results

📜 Citation

🙏 Acknowledgments

👥 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

gouthamireddy2507/ECHO

Folders and files

Latest commit

History

Repository files navigation

ECHO: Enhanced Communication for Hearing Impaired in Online Podcasts

✨ Features

🛠️ Architecture Overview

📹 Video Processing

🎙️ Audio Processing

🔗 Integration

📊 Evaluation

🔍 Ablation Studies

📂 Dataset

🚀 Usage

⚙️ Prerequisites

🔧 Installation

▶️ Running the Model

🏅 Results

📜 Citation

🙏 Acknowledgments

👥 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages