This repository documents my process of learning how to use OpenAI's Whisper, a powerful and open-source tool for automatic speech recognition (ASR).
The primary goal of this project was to understand the fundamentals of Whisper and use it to transcribe audio files into text. This work was completed by following an excellent YouTube tutorial from Kevin Stratvert, which provided a clear, step-by-step guide.
To avoid the need for powerful local hardware, this entire project was developed and executed using Google Colaboratory. This platform provides free access to GPUs in the cloud, making it an ideal environment for running machine learning models like Whisper directly in the browser.
Whisper is an automatic speech recognition system developed by OpenAI. It was trained on a massive and diverse dataset of audio, enabling it to accurately transcribe speech from various languages, accents, and even noisy environments.
The key steps involved in this project were:
- Setting up a new notebook in Google Colaboratory.
- Connecting Google Colab to Google Drive and enabling the GPU hardware accelerator.
- Installing Whisper and its dependency, FFmpeg, using
pip
andapt
. - Uploading an audio file to the Colab environment.
- Running the Whisper command with the desired model (e.g.,
base
,small
,medium
) to generate the transcription. - Downloading the resulting text (.txt) and caption (.srt, .vtt) files.
Thank you for visiting my repository!