SpeechKitty

SpeechKitty is a wrapper for two Automatic Speech Recognition (ASR) services: Yandex SpeechKit and whisperX (powered by OpenAI's Whisper). It is designed to asynchronously transcribe audio recordings.

NOTE

This is an early version of the package. While it works reliably with Asterisk recordings in my setup, it has not been extensively tested with other use cases or audio formats. You may encounter bugs or limitations that have not yet been identified.

Key features

Recursively scans directories for .wav files.
Supports regex patterns to include or exclude specific files.
Skips files that have already been transcribed.
Handles intermediate tasks such as audio conversion and uploading to object storage.
Generates transcription outputs as .json and .html files, saved alongside the audio files.
Offers the option to obfuscate .html file names using a hash function for added privacy.

Usage

You can use SpeechKitty either as a Python package or within a Docker container, depending on your preference and setup requirements.

Prerequisites

Yandex Cloud account.
Bucket at Object Storage.
Static access key for Object Storage.
API key for SpeechKit.

-OR-

Up and running whisperX-REST.

Python Package

Install required ffmpeg library.
Create venv (preferably) and install the package.

pip install speechkitty

Download scripts from sample directory at project page:

.env-example — rename to .env
transcribe_directory.py

Fill in credentials into .env.
Start transcribing a directory (/mnt/Records in the example below):

python transcribe_directory.py /mnt/Records

Docker Container

Install Docker Engine.
Download project code from the project page on GitHub.
Put credentials into .env file.
Build the Docker image. Open the project directory in your terminal and run the following command:

docker build -t speechkitty .

Building image may take a while. After it finishes:

Run container. Assuming you have records in /mnt/Records and/or its subdirectories, current directory in terminal is project's directory, and you have .env file in the sample directory, the command will look like:

docker run -i --rm --env-file sample/.env -v /mnt/Records:/mnt/Records \
speechkitty /bin/bash -c "python sample/transcribe_directory.py /mnt/Records"

Another option is to use shell script:

source sample/transcribe_directory.sh /mnt/Records

To rename html files using hash of the audio files names, add name of hash function as a second parameter like that:

source sample/transcribe_directory.sh /mnt/Records md5

This approach is particularly useful if the records directory is hosted on a web server (with directory listing disabled for security). By obfuscating the HTML file names, you enhance privacy by preventing direct access to the audio files via predictable links. To retrieve the obfuscated HTML file names, you can use a database query like the following:

SELECT CONCAT(TO_HEX(MD5(recordingfile)), ".html") AS transcript
FROM your_table_name;

Replace your_table_name with the appropriate table name in your database.

Transcribing jobs may take some time to complete. You can verify that the process is running successfully by checking for the creation of new .json and .html files in the records directory. These files indicate that transcription results are being generated.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
app		app
sample		sample
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpeechKitty

Key features

Usage

Prerequisites

Python Package

Docker Container

About

Uh oh!

Releases 5

Packages

Uh oh!

Languages

License

AxesAccess/SpeechKitty

Folders and files

Latest commit

History

Repository files navigation

SpeechKitty

Key features

Usage

Prerequisites

Python Package

Docker Container

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

Packages