BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.
-
Updated
Mar 1, 2025 - Python
BanglaSpeech2Text: An open-source offline speech-to-text package for Bangla language. Fine-tuned on the latest whisper speech to text model for optimal performance.
🔊😊 A fastapi voice-assistant framework to quickly prototype LLM-powered voice assistants in <5 minutes.
High-performance Google Colab Notebook for fast & accurate audio transcription/translation using OpenAI Whisper. Accelerated on TPUs with PyTorch/XLA. Features an interactive UI for model selection, multi-language support, and long-form audio processing.
French audio transcription using gradio
A real-time voice-to-text and text-to-speech AI pipeline using Whisper, an LLM, and Edge-TTS with tunable parameters for low-latency audio processing and response generation.
free macOS whisper dictation app
📝 Turn audio into text effortlessly. Audio transcription powered by OpenAI's Whisper API.
Subtitles Generator: Автоматический генератор субтитров для видео с поддержкой перевода на различные языки, использующий модель Whisper от OpenAI.
MinutesOfMeeting and Gmail is a collaborative crew of AI agents that autonomously understand audio, transcripts, summarizes, writes and drafts an email in Gmail account.
Convert YouTube videos to text files. Why spend 30 minutes watching a video when you can skim the transcript in a couple minutes?
The Whisper Subtitle Generator leverages OpenAI's Whisper model to generate subtitles from audio and video files. This Python-based tool supports multiple languages and employs advanced audio processing techniques to ensure high accuracy in transcription.
Generates subtitles from a video speech (Whisper OpenAI LLM) or extracts existing subtitles, translates them into a different language using Mistral LLM and adds them to the video. Uses ffmpeg for extracting and encoding
This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.
This repository contains notebook that shows how to fine-tune OpenAI's Whisper model on custom Hindi dataset.
Offline-friendly backend POC to transcribe YouTube videos and chat with video content using Whisper (no cloud required) and local LLMs via Ollama like Mistral or LLaMA2. Built with Flask and PostgreSQL, fully open source with Swagger APIs. Easily connect any frontend.
A real time chat application using Next, Redis, Pub/Sub, Audio-To-Text LLM, Next-auth. I am still working on it
Add a description, image, and links to the whisper-model topic page so that developers can more easily learn about it.
To associate your repository with the whisper-model topic, visit your repo's landing page and select "manage topics."