This project allows you to convert audio files to text using the speech_recognition
library and the Google Web Speech API. There is also an option to format the text to improve readability. Works on Docker.
- Supports multiple recognition languages (English, Ukrainian, Slovak, etc.)
- Convert audio files to text
- Improved formatting of recognized text
- Reset Docker and Docker Compose.
Before using, make sure that Docker and Docker Compose are installed:
🔹 On Linux (Arch, Ubuntu, Debian):
sudo pacman -S docker docker-compose # For Arch
sudo apt install docker docker-compose -y # For Ubuntu/Debian
🔹 On macOS and Windows — install using the official Docker website.
Make sure Docker is working:
docker --version
docker-compose --version
- Clone this repository (make sure that you have installed the git):
git clone https://github.com/Klipar/speech_to_text_convertor.git
cd speech_to_text_convertor
- Build a Docker image
This command builds the container with the project (using a
Dockerfile
).
docker-compose build
- If the installation went smoothly, then congratulations, now you can start using it!
- Launch the container and immediately enter the terminal to work with the program:
docker-compose run --rm speech-to-text bash
- To use, simply enter the following command to start the script execution:
python main.py
You can also specify the name of the file to be transcribed right at startup as follows:
python main.py Path/to/your/audio.file
- Select the recognition language.
- Specify the path to the audio file (if you haven't already done so when you start the program).
- You can format it for easier viewing or skip this step.
- Get the text as a
.txt
file.
.mp3
.wav
.flac
.ogg
This project is licensed under the MIT License.