Interpreto

Interpreto is a system for transcribing audio and video files. It provides an easy-to-use web interface for uploading media files and automatically generates transcriptions with timestamps. The modular architecture of the project, makes easy to add more functionality, like another interface (for example, a Telegram bot).

Interface

Features

Upload audio and video files for transcription
Real-time transcription updates via server-sent events
Display of transcription alongside media player
Automatic subtitle generation in WebVTT format
Language detection for transcripts
Efficient audio processing with voice activity detection

Architecture

Interpreto is built using a microservices architecture:

Frontend: React-based web interface for uploading files and viewing transcriptions
Backend API: FastAPI service that handles file uploads and job management
Worker Service: Processes media files using Whisper models and voice detection
Storage:
- MongoDB for storing file metadata and transcriptions
- MinIO for storing media files
- Redis for real-time communication between services

Installation

Prerequisites

Docker and Docker Compose
Python 3.8+
Node.js 14+
CUDA-capable GPU (optional, for faster transcription)

Setup

Clone the repository:

git clone https://github.com/mperalsapa/interpreto.git
cd interpreto

Set up the environment variables (or use the defaults in the code):

MONGO_HOST=localhost
MONGO_PORT=27017
MONGO_USER=frontend-service
MONGO_PASS=frontend-service
MONGO_DB=media_service

# MinIO config
MINIO_HOST=localhost
MINIO_PORT=9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin

# Redis config
REDIS_HOST=localhost
REDIS_PORT=6379

Start the services using Docker Compose:

docker-compose up -d

Usage

Open the web application in your browser at http://localhost:8080
Upload an audio or video file using the upload form
Wait for the transcription to process - you'll see real-time updates
View the transcription alongside the media player
The transcription will be automatically displayed as subtitles

Technologies

Frontend: React
Backend: FastAPI (Python)
Speech Recognition: Whisper, faster-whisper
Voice Activity Detection: Silero VAD
Storage: MongoDB, MinIO, Redis
Processing: PyTorch, torchaudio

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
doc		doc
src		src
.gitignore		.gitignore
.onedev-buildspec.yml		.onedev-buildspec.yml
README.md		README.md
docker-compose.yaml		docker-compose.yaml
example.env		example.env
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Interpreto

Interface

Features

Architecture

Installation

Prerequisites

Setup

Usage

Technologies

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mperalsapa/interpreto

Folders and files

Latest commit

History

Repository files navigation

Interpreto

Interface

Features

Architecture

Installation

Prerequisites

Setup

Usage

Technologies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages