SpeechKitty is a wrapper for two Automatic Speech Recognition (ASR) services: Yandex SpeechKit and whisperX (powered by OpenAI's Whisper). It is designed to asynchronously transcribe audio recordings.
NOTE
This is an early version of the package. While it works reliably with Asterisk recordings in my setup, it has not been extensively tested with other use cases or audio formats. You may encounter bugs or limitations that have not yet been identified.
- Recursively scans directories for
.wav
files. - Supports regex patterns to include or exclude specific files.
- Skips files that have already been transcribed.
- Handles intermediate tasks such as audio conversion and uploading to object storage.
- Generates transcription outputs as
.json
and.html
files, saved alongside the audio files. - Offers the option to obfuscate
.html
file names using a hash function for added privacy.
You can use SpeechKitty either as a Python package or within a Docker container, depending on your preference and setup requirements.
- Yandex Cloud account.
- Bucket at Object Storage.
- Static access key for Object Storage.
- API key for SpeechKit.
-OR-
- Up and running whisperX-REST.
-
Install required ffmpeg library.
-
Create venv (preferably) and install the package.
pip install speechkitty
- Download scripts from sample directory at project page:
- .env-example — rename to
.env
- transcribe_directory.py
-
Fill in credentials into
.env
. -
Start transcribing a directory (
/mnt/Records
in the example below):
python transcribe_directory.py /mnt/Records
-
Install Docker Engine.
-
Download project code from the project page on GitHub.
-
Put credentials into
.env
file. -
Build the Docker image. Open the project directory in your terminal and run the following command:
docker build -t speechkitty .
Building image may take a while. After it finishes:
- Run container. Assuming you have records in
/mnt/Records
and/or its subdirectories, current directory in terminal is project's directory, and you have.env
file in thesample
directory, the command will look like:
docker run -i --rm --env-file sample/.env -v /mnt/Records:/mnt/Records \
speechkitty /bin/bash -c "python sample/transcribe_directory.py /mnt/Records"
Another option is to use shell script:
source sample/transcribe_directory.sh /mnt/Records
To rename html files using hash of the audio files names, add name of hash function as a second parameter like that:
source sample/transcribe_directory.sh /mnt/Records md5
This approach is particularly useful if the records directory is hosted on a web server (with directory listing disabled for security). By obfuscating the HTML file names, you enhance privacy by preventing direct access to the audio files via predictable links. To retrieve the obfuscated HTML file names, you can use a database query like the following:
SELECT CONCAT(TO_HEX(MD5(recordingfile)), ".html") AS transcript
FROM your_table_name;
Replace your_table_name
with the appropriate table name in your database.
Transcribing jobs may take some time to complete. You can verify that the process is running successfully by checking for the creation of new .json
and .html
files in the records directory. These files indicate that transcription results are being generated.