This project is designed to provide a high-performance, GPU-accelerated environment for generating transcripts from audio files using AMD hardware and the ROCm platform. The project includes multiple scripts for different use cases, including an automatic file monitoring service (main.py
) and a web-based user interface (app.py
) built with Gradio. The setup is containerized using Docker and Docker Compose, ensuring a consistent and isolated environment optimized for ROCm.
- Docker: Ensure that Docker (version 20.10 or newer) is installed and running on your system.
- Docker Compose: Ensure that Docker Compose is installed (comes bundled with Docker Desktop on Windows and Mac, or can be installed separately on Linux).
- ROCm: This project requires an AMD GPU that is compatible with ROCm 6.1.2.
-
Clone the repository:
git clone https://github.com/beecave-homelab/insanely-fast-whisper-rocm.git cd insanely-fast-whisper-rocm
-
Create a
.env
file:- Create a
.env
file in the root directory of the project with the necessary configuration. Example:
# Default values for main.py UPLOADS="uploads" TRANSCRIPTS="transcripts" LOGS="logs" BATCH_SIZE=6 VERBOSE=true MODEL=distil-whisper/distil-large-v3 # Default values for convert_output.py CONVERT_OUTPUT_FORMATS="txt,srt" CONVERT_CHECK_INTERVAL=120 PROCESSED_TXT_DIR="transcripts-txt" PROCESSED_SRT_DIR="transcripts-srt"
- Create a
-
Build the Docker image:
docker-compose build
-
Run the Docker container:
docker-compose up -d
The app.py
script provides a web interface for uploading files and generating transcripts.
-
Access the web interface:
- Navigate to
http://localhost:7862
in your web browser.
- Navigate to
-
Upload an audio file:
- Use the provided interface to upload an audio file. The file will be processed, and the transcript will be generated and displayed in the interface.
-
View logs:
- Real-time logs are displayed in the web interface, and you can also find them in the
/logs
directory.
- Real-time logs are displayed in the web interface, and you can also find them in the
The docker-compose.yaml
file allows you to specify which script from the /src
folder should be run by modifying the command
line. By default, it runs the /src/app.py
script (Gradio Web UI). To run a different script, change the command
section in docker-compose.yaml
accordingly. For example, to use the automatic uploading service (main.py
):
command: ["src/main.py"] # For automatically processing files in the uploads directory.
The main.py
script monitors a specified directory for new files and automatically generates transcripts. Follow these steps to use this feature:
-
Start the service:
- Ensure the Docker container is running (
docker-compose up -d
).
- Ensure the Docker container is running (
-
Place files in the
/uploads
directory:- Any files added to this directory will be automatically processed, and the transcripts will be placed in the
/transcripts
directory.
- Any files added to this directory will be automatically processed, and the transcripts will be placed in the
-
Check logs:
- Logs for the processing will be stored in the
/logs
directory.
- Logs for the processing will be stored in the
This project is licensed under the MIT license. See LICENSE for more information.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.