From 6c1d21c8bfd542a902fe494aa3b508dec6713ec3 Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Tue, 8 Apr 2025 10:23:52 -0500 Subject: [PATCH 1/7] Update version-and-release.yml Updated release workflow --- .github/workflows/version-and-release.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/version-and-release.yml b/.github/workflows/version-and-release.yml index 416a313..d87318d 100644 --- a/.github/workflows/version-and-release.yml +++ b/.github/workflows/version-and-release.yml @@ -39,7 +39,7 @@ jobs: if [[ "$commit" == *"BREAKING CHANGE"* || "$commit" == *"!:"* ]]; then MAJOR_CHANGE=true break - elif [[ "$commit" =~ ^feat(\([^)]+\))?:.* ]]; then + elif echo "$commit" | grep -Eq "^feat(\([^)]+\))?:.*"; then MINOR_CHANGE=true fi done @@ -76,4 +76,4 @@ jobs: Release ${{ steps.version.outputs.release_tag }} Changes in this release: - ${{ github.event.pull_request.title }} (#${{ github.event.pull_request.number }}) \ No newline at end of file + ${{ github.event.pull_request.title }} (#${{ github.event.pull_request.number }}) From 802c259baed73c70e0894a9cd08d14ecf216ee3b Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Tue, 8 Apr 2025 10:29:26 -0500 Subject: [PATCH 2/7] Update README.md Updated badges --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 34a115c..6088475 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,8 @@ Scriberr is a self-hostable AI audio transcription app. It leverages the open-so ### Build Status **Main Branch:** -[![Main Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/main-docker.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/main-docker.yml) -[![Main CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/main-cuda-docker.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/main-cuda-docker.yml) +[![Main Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_Docker_Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_Docker_Build.yml) +[![Main CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_CUDA_Docker_Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_CUDA_Docker_Build.yml) **Nightly Branch:** [![Nightly Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-docker.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-docker.yml) @@ -226,4 +226,4 @@ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file --- -*Thank you for your patience, support, and interest in the project. Looking forward to any and all feedback.* \ No newline at end of file +*Thank you for your patience, support, and interest in the project. Looking forward to any and all feedback.* From 4eb07827f495be20bea0767b27004a4bc75cc64e Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Tue, 8 Apr 2025 10:30:09 -0500 Subject: [PATCH 3/7] Update README.md Updated badges again From 8f9e0cc3cdc508aef1b745337d48781c4b648507 Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Tue, 8 Apr 2025 10:45:26 -0500 Subject: [PATCH 4/7] Update README.md Updated workflow badges and updated diarization model information --- README.md | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 6088475..48ff4da 100644 --- a/README.md +++ b/README.md @@ -5,16 +5,14 @@ Scriberr is a self-hostable AI audio transcription app. It leverages the open-so **Note**: This app is under active development, and this release includes **breaking changes**. You will lose your old data. Please read the installation instructions carefully. -** DIARIZATION UPDATE **: Diarization is under heavy development and will be disabled until an appropriate implementation is able to be handled. Currently it does not perform to expectation and is being disabled. Hoping to have this released as a full feature in 0.5.0. - ### Build Status **Main Branch:** -[![Main Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_Docker_Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_Docker_Build.yml) -[![Main CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_CUDA_Docker_Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main_CUDA_Docker_Build.yml) +[![Main Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main%20Docker%20Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main%20Docker%20Build.yml) +[![Main CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Main%20Cuda%20Docker%20Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Main%20Cuda%20Docker%20Build.yml) **Nightly Branch:** -[![Nightly Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-docker.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-docker.yml) -[![Nightly CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-cuda-docker.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/nightly-cuda-docker.yml) +[![Nightly Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Nightly%20Docker%20Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Nightly%20Docker%20Build.yml) +[![Nightly CUDA Docker](https://github.com/rishikanthc/Scriberr/actions/workflows/Nightly%20Cuda%20Docker%20Build.yml/badge.svg)](https://github.com/rishikanthc/Scriberr/actions/workflows/Nightly%20Cuda%20Docker%20Build.yml) ## Table of Contents @@ -158,14 +156,23 @@ The application can be customized using the following environment variables in y #### Speaker Diarization Setup -Scriberr uses the PyAnnote speaker diarization model from HuggingFace, which requires an API key for download. During the initial setup process: +##### Required Models +The application requires access to the following Hugging Face models: + +* pyannote/speaker-diarization-3.1 +* pyannote/segmentation-3.0 +###### Setup Steps +1. Create a free account at HuggingFace if you don’t already have one. +2. Generate an API token at HuggingFace Tokens. +3. Accept user conditions for the required models on Hugging Face: + - Visit pyannote/speaker-diarization-3.1 and accept the conditions. + - Visit pyannote/segmentation-3.0 and accept the conditions. +4. Enter the API token in the setup wizard when prompted. The token is only used during initial setup and is not stored permanently. +Storage and Usage + -1. Create a free account at [HuggingFace](https://huggingface.co/) -2. Generate an API token at https://huggingface.co/settings/tokens -3. Enter this token in the setup wizard when prompted -4. The token is only used during initial setup and is not stored permanently +The diarization models are downloaded once and stored locally, so you won’t need to provide the API key again after the initial setup. -The diarization model is downloaded once and stored locally, so you won't need to provide the API key again after setup. ### Updating from Previous Versions From 881e047a3a215e490f82d98d2b5b55e8b82d7f0d Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Fri, 11 Apr 2025 14:04:21 -0500 Subject: [PATCH 5/7] chore: Update env.example to clear the issues with the formatting. Also updated the .*ignore files with the necessary changes --- .dockerignore | 3 +++ .gitignore | 2 -- env.example | 15 +++++++++------ 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/.dockerignore b/.dockerignore index 53393ba..f6dcb26 100644 --- a/.dockerignore +++ b/.dockerignore @@ -8,4 +8,7 @@ scriberr install_aw.sh .prettier* venv +.env +env.example + diff --git a/.gitignore b/.gitignore index 224844b..342bf2a 100644 --- a/.gitignore +++ b/.gitignore @@ -13,8 +13,6 @@ Thumbs.db # Env .env .env.* -!.env.example -!.env.test # Vite vite.config.js.timestamp-* diff --git a/env.example b/env.example index 5a344d6..a678b8d 100644 --- a/env.example +++ b/env.example @@ -10,6 +10,7 @@ POSTGRES_PASSWORD=mysecretpassword # Password for PostgreSQL database POSTGRES_DB=local # Database name DATABASE_URL=postgres://root:mysecretpassword@db:5432/local # Database URL for connection to PostgreSQL database with credentials from above + # Application configuration ADMIN_USERNAME=admin # Username for admin user in web interface ADMIN_PASSWORD=password # Password for admin user in web interface @@ -33,19 +34,21 @@ OLLAMA_BASE_URL="" # If using OpenAI, you must set these to your API keys # If using a custom API compatible server, you must set these to your API keys OPENAI_API_KEY="" # Needed for retrieving models from OpenAI, for Ollama connections, this can be left blank or set to a dummy value -HF_API_KEY="" # Needed for retrieving models from HuggingFace for Diarization # Diarization configuration -# Default Model to use for Diarization, can be set to any HuggingFace model that supports diarization +# Default Model to use for Diarization, can be set to any compatible model that supports diarization # NOTE: This model will be downloaded automatically if it is not already present in the models directory -# NOTE: You can use any model that supports diarization, but the default model is pyannote/speaker-diarization -# NOTE: You can find a list of models that support diarization here: https://huggingface.co/models?other=speaker-diarization -DIARIZATION_MODEL=pyannote/speaker-diarization +# NOTE: You MUST provide a valid HuggingFace API token with access to pyannote/speaker-diarization models +DIARIZATION_MODEL=pyannote/speaker-diarization@3.0 +HUGGINGFACE_TOKEN="" # Required for accessing speaker diarization models from HuggingFace +# Paths +# These almost never need to be changed. They are the paths to the directories where the models and audio files are stored MODELS_DIR=/scriberr/models WORK_DIR=/scriberr/temp AUDIO_DIR=/scriberr/uploads # Server configuration BODY_SIZE_LIMIT=1G -HARDWARE_ACCEL=cpu # Set to 'gpu' if you have a Nvidia GPU \ No newline at end of file +HARDWARE_ACCEL=cpu # Set to 'gpu' if you have a Nvidia GPU +USE_WORKER=true # Enable background processing of transcription jobs \ No newline at end of file From 1948c8b1a75df07d46f835cac1c8e404b0413562 Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Fri, 11 Apr 2025 14:19:48 -0500 Subject: [PATCH 6/7] chore: Update env.example to clear the issues with the formatting. Now using Linux line endings --- .env | 51 -------------------- docker-compose.gpu.yml | 36 +++++++------- env.example | 104 ++++++++++++++++++++--------------------- 3 files changed, 69 insertions(+), 122 deletions(-) delete mode 100644 .env diff --git a/.env b/.env deleted file mode 100644 index b164c66..0000000 --- a/.env +++ /dev/null @@ -1,51 +0,0 @@ -# .env file -# Docker image configuration -IMAGE_TAG=main # Docker image tag to use for building the Docker image -PORT=3000 # Port to use for running the web interface - -# Database configuration -POSTGRES_PORT=5432 # Port to use for PostgreSQL database -POSTGRES_USER=root # Username for PostgreSQL database -POSTGRES_PASSWORD=mysecretpassword # Password for PostgreSQL database -POSTGRES_DB=local # Database name -DATABASE_URL=postgres://root:mysecretpassword@db:5432/local # Database URL for connection to PostgreSQL database with credentials from above - -# Application configuration -ADMIN_USERNAME=admin # Username for admin user in web interface -ADMIN_PASSWORD=password # Password for admin user in web interface - -# AI configuration -# Default Model to use for transcription, can be set to any OpenAI model or Ollama model -# For ollama connections, enter the model name and version number. EG: llama3.2:latest -AI_MODEL="gpt-3.5-turbo" - -# Leave blank to use default (OpenAI API), otherwise set to the base URL of your OpenAI API compatible server -# For ollama connections, enter the IP of the Ollama server, and then the port it is running on. -# Include the /v1/ or /api/v1/ path if needed (OpenWeb UI uses /api/ and ollama uses /v1/ -# Example: http://192.168.1.5:11434 or http://host.docker.internal:11434 -# NOTE: host.docker.internal is only available on Windows and MacOS, use the IP address of the host machine on Linux -# NOTE: localhost and 127.0.0.1 will not work, as they refer to the container itself, not the host machine -OLLAMA_BASE_URL="" - -# API Keys -# NOTE: -# If using Ollama, you can leave these blank or set to a dummy value -# If using OpenAI, you must set these to your API keys -# If using a custom API compatible server, you must set these to your API keys -OPENAI_API_KEY="" # Needed for retrieving models from OpenAI, for Ollama connections, this can be left blank or set to a dummy value -HF_API_KEY="" # Needed for retrieving models from HuggingFace for Diarization - -# Diarization configuration -# Default Model to use for Diarization, can be set to any HuggingFace model that supports diarization -# NOTE: This model will be downloaded automatically if it is not already present in the models directory -# NOTE: You can use any model that supports diarization, but the default model is pyannote/speaker-diarization -# NOTE: You can find a list of models that support diarization here: https://huggingface.co/models?other=speaker-diarization -DIARIZATION_MODEL=pyannote/speaker-diarization - -MODELS_DIR=/scriberr/models -WORK_DIR=/scriberr/temp -AUDIO_DIR=/scriberr/uploads - -# Server configuration -BODY_SIZE_LIMIT=1G -HARDWARE_ACCEL=cpu # Set to 'gpu' if you have a Nvidia GPU diff --git a/docker-compose.gpu.yml b/docker-compose.gpu.yml index d70e130..04be4d9 100644 --- a/docker-compose.gpu.yml +++ b/docker-compose.gpu.yml @@ -1,19 +1,19 @@ -# This can be added when running the main docker-compose.yml file to add gpu support -# add this in your command line: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up -services: - app: - build: - context: . - dockerfile: Dockerfile-cuda128 - # You can find your architecture by running: nvidia-smi if on linux - # You can find your architecture by running: system_profiler SPDisplaysDataType if on mac - # You can find your architecture by running: wmic path win32_videocontroller get name if on windows - # You will need to change the image to match your architecture, E.G. "main-cuda-11" - image: ghcr.io/rishikanthc/scriberr:${IMAGE_TAG:-main-gpu} - deploy: - resources: - reservations: - devices: - - driver: nvidia - count: all +# This can be added when running the main docker-compose.yml file to add gpu support +# add this in your command line: docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up +services: + app: + build: + context: . + dockerfile: Dockerfile-gpu + # You can find your architecture by running: nvidia-smi if on linux + # You can find your architecture by running: system_profiler SPDisplaysDataType if on mac + # You can find your architecture by running: wmic path win32_videocontroller get name if on windows + # You will need to change the image to match your architecture, E.G. "main-cuda-11" + image: ghcr.io/rishikanthc/scriberr:main-cuda-11 + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all capabilities: [gpu] \ No newline at end of file diff --git a/env.example b/env.example index 782cf4f..8eeaae7 100644 --- a/env.example +++ b/env.example @@ -1,54 +1,52 @@ -# .env file -# Docker image configuration -IMAGE_TAG=main # Docker image tag to use for building the Docker image -PORT=3000 # Port to use for running the web interface - -# Database configuration -POSTGRES_PORT=5432 # Port to use for PostgreSQL database -POSTGRES_USER=root # Username for PostgreSQL database -POSTGRES_PASSWORD=mysecretpassword # Password for PostgreSQL database -POSTGRES_DB=local # Database name -DATABASE_URL=postgres://root:mysecretpassword@db:5432/local # Database URL for connection to PostgreSQL database with credentials from above - -# Application configuration -ADMIN_USERNAME=admin # Username for admin user in web interface -ADMIN_PASSWORD=password # Password for admin user in web interface - -# AI configuration -# Default Model to use for transcription, can be set to any OpenAI model or Ollama model -# For ollama connections, enter the model name and version number. EG: llama3.2:latest -AI_MODEL="gpt-3.5-turbo" - -# Leave blank to use default (OpenAI API), otherwise set to the base URL of your OpenAI API compatible server -# For ollama connections, enter the IP of the Ollama server, and then the port it is running on. -# Include the /v1/ or /api/v1/ path if needed (OpenWeb UI uses /api/ and ollama uses /v1/ -# Example: http://192.168.1.5:11434 or http://host.docker.internal:11434 -# NOTE: host.docker.internal is only available on Windows and MacOS, use the IP address of the host machine on Linux -# NOTE: localhost and 127.0.0.1 will not work, as they refer to the container itself, not the host machine -OLLAMA_BASE_URL="" - -# API Keys -# NOTE: -# If using Ollama, you can leave these blank or set to a dummy value -# If using OpenAI, you must set these to your API keys -# If using a custom API compatible server, you must set these to your API keys -OPENAI_API_KEY="" # Needed for retrieving models from OpenAI, for Ollama connections, this can be left blank or set to a dummy value - -# Diarization configuration -# Default Model to use for Diarization, can be set to any compatible model that supports diarization -# NOTE: This model will be downloaded automatically if it is not already present in the models directory -# NOTE: You MUST provide a valid HuggingFace API token with access to pyannote/speaker-diarization models - -DIARIZATION_MODEL=pyannote/speaker-diarization@3.0 -HUGGINGFACE_TOKEN="" # Required for accessing speaker diarization models from HuggingFace - -# Paths -# These almost never need to be changed. They are the paths to the directories where the models and audio files are stored -MODELS_DIR=/scriberr/models -WORK_DIR=/scriberr/temp -AUDIO_DIR=/scriberr/uploads - -# Server configuration -BODY_SIZE_LIMIT=1G -HARDWARE_ACCEL=cpu # Set to 'gpu' if you have a Nvidia GPU +# .env file +# Docker image configuration +IMAGE_TAG=main # Docker image tag to use for building the Docker image +PORT=3000 # Port to use for running the web interface + +# Database configuration +POSTGRES_PORT=5432 # Port to use for PostgreSQL database +POSTGRES_USER=root # Username for PostgreSQL database +POSTGRES_PASSWORD=mysecretpassword # Password for PostgreSQL database +POSTGRES_DB=local # Database name +DATABASE_URL=postgres://root:mysecretpassword@db:5432/local # Database URL for connection to PostgreSQL database with credentials from above + +# Application configuration +ADMIN_USERNAME=admin # Username for admin user in web interface +ADMIN_PASSWORD=password # Password for admin user in web interface + +# AI configuration +# Default Model to use for transcription, can be set to any OpenAI model or Ollama model +# For ollama connections, enter the model name and version number. EG: llama3.2:latest +AI_MODEL="gpt-3.5-turbo" +# Leave blank to use default (OpenAI API), otherwise set to the base URL of your OpenAI API compatible server +# For ollama connections, enter the IP of the Ollama server, and then the port it is running on. +# Include the /v1/ or /api/v1/ path if needed (OpenWeb UI uses /api/ and ollama uses /v1/ +# Example: http://192.168.1.5:11434 or http://host.docker.internal:11434 +# NOTE: host.docker.internal is only available on Windows and MacOS, use the IP address of the host machine on Linux +# NOTE: localhost and 127.0.0.1 will not work, as they refer to the container itself, not the host machine +OLLAMA_BASE_URL="" + +# API Keys +# NOTE: +# If using Ollama, you can leave these blank or set to a dummy value +# If using OpenAI, you must set these to your API keys +# If using a custom API compatible server, you must set these to your API keys +OPENAI_API_KEY="" # Needed for retrieving models from OpenAI, for Ollama connections, this can be left blank or set to a dummy value + +# Diarization configuration +# Default Model to use for Diarization, can be set to any compatible model that supports diarization +# NOTE: This model will be downloaded automatically if it is not already present in the models directory +# NOTE: You MUST provide a valid HuggingFace API token with access to pyannote/speaker-diarization models +DIARIZATION_MODEL=pyannote/speaker-diarization@3.0 +HUGGINGFACE_TOKEN="" # Required for accessing speaker diarization models from HuggingFace + +# Paths +# These almost never need to be changed. They are the paths to the directories where the models and audio files are stored +MODELS_DIR=/scriberr/models +WORK_DIR=/scriberr/temp +AUDIO_DIR=/scriberr/uploads + +# Server configuration +BODY_SIZE_LIMIT=1G +HARDWARE_ACCEL=cpu # Set to 'gpu' if you have a Nvidia GPU USE_WORKER=true # Enable background processing of transcription jobs \ No newline at end of file From dc2916da54477a092e5c54ee33d6ff3247529efd Mon Sep 17 00:00:00 2001 From: SpirusNox <78000963+SpirusNox@users.noreply.github.com> Date: Fri, 11 Apr 2025 14:20:07 -0500 Subject: [PATCH 7/7] chore: Update env.example to clear the issues with the formatting. Now using Linux line endings --- .idea/codeStyles/codeStyleConfig.xml | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 .idea/codeStyles/codeStyleConfig.xml diff --git a/.idea/codeStyles/codeStyleConfig.xml b/.idea/codeStyles/codeStyleConfig.xml new file mode 100644 index 0000000..a55e7a1 --- /dev/null +++ b/.idea/codeStyles/codeStyleConfig.xml @@ -0,0 +1,5 @@ + + + + \ No newline at end of file