Skip to content

A browser-based application that captures microphone input, transcribes speech using Whisper AI, and translates English to Hindi in real-time.

Notifications You must be signed in to change notification settings

AkankshRakesh/Realtime-audiobridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Real-Time Speech Transcriber & Translator

Docker FastAPI React

A browser-based application that captures microphone input, transcribes speech using Whisper AI, and translates English to Hindi in real-time. Fully containerized with Docker for easy deployment.

🌟 Features

  • 🎤 Live microphone recording via browser
  • ✍️ Accurate transcription using Whisper (tiny model)
  • 🌍 English → Hindi translation via Argos Translate
  • 5-second processing cycles for near real-time results
  • 🐳 Full Docker integration (frontend + backend)
  • 🚀 Modern stack (React, FastAPI, Tailwind CSS)

📦 Prerequisites

  • Docker & Docker Compose
  • Git (optional: Git LFS for large model files)

🛠️ Installation

1. Clone the Repository

git clone https://github.com/AkankshRakesh/realtime-transcriber.git
cd realtime-transcriber

2. Download Translation Model

Download the English-Hindi model from Argos Open Tech and place it in the backend folder:

mv translate-en_hi-1_1.argosmodel backend/translate-en_hi.argosmodel

Note: If the model exceeds 100MB, consider using Git LFS.

Running the application

docker-compose up --build

Access the application: Frontend: http://localhost:5173 Backend API: http://localhost:8000

📚 API Documentation

Upload webm audio for transcription and translation.

  POST /transcribe
Content-Type File Response
multipart/form-data recording.webm transcript, translation

🏗️ Project Structure

.
├── backend/
│   ├── main.py                # FastAPI application
│   ├── Dockerfile             # Backend container config
│   ├── requirements.txt       # Python dependencies
│   └── translate-en_hi.argosmodel  # Translation model
├── frontend/
│   ├── src/components/        # React UI components
│   ├── Dockerfile             # Frontend container config
│   ├── package.json
│   └── tailwind.config.js     # Tailwind CSS config
└── docker-compose.yml         # Multi-container orchestration

Architecture Diagram

Tech Stack

Frontend: React, Tailwind CSS, Vite

Backend: FastAPI, Python

Transcription: faster-whisper

Translation: Argos Translate

Audio Processing: FFmpeg

Containerization: Docker, Docker Compose

About

A browser-based application that captures microphone input, transcribes speech using Whisper AI, and translates English to Hindi in real-time.

Topics

Resources

Stars

Watchers

Forks