Ideas & improvements #4

mateuszdrab · 2025-02-09T02:40:53Z

mateuszdrab
Feb 9, 2025

I came across this project when looking for a proxy that can expose Wyoming piper/whisper to an application that expects an OpenAI endpoint.
In other words, I'd like to connect open-webui to my whisper instance, which is hardware accelerated, rather than using OpenAI.

It seems this project does the opposite direction proxying.

Do you plan on adding functionality to provide the proxy in the opposite direction?

Thanks!

roryeckel · 2025-02-09T02:55:29Z

roryeckel
Feb 9, 2025
Maintainer

So, it is my opinion that the Wyoming protocol is best suited for communication between satellites and so forth. OpenAI is more "globally accepted" for AI backends.

That's why I took the approach in this direction - in my lab, I stopped using wyoming_piper and replaced it with FastAPI-Kokoro (but there are Piper options as well, such as the speaches container)

With this setup, I am able to share piper or kokoro with both Open WebUI and Home Assistant

0 replies

roryeckel · 2025-02-09T02:58:48Z

roryeckel
Feb 9, 2025
Maintainer

Highly recommend replacing wyoming_piper with "Speaches".
Speaches supports both Piper, Kokoro, and Whisper all in one. Cuda supported.
It also works with this project. Here's an example: https://github.com/roryeckel/wyoming_openai/blob/main/docker-compose.speaches.yml

0 replies

roryeckel · 2025-02-09T18:09:10Z

roryeckel
Feb 9, 2025
Maintainer

I think your suggestion would be a good improvement. It's just not supported yet which is why I suggested the other method

0 replies

mateuszdrab · 2025-02-09T20:22:40Z

mateuszdrab
Feb 9, 2025
Author

Thanks @roryeckel

I'll investigate the options, I think there's no way to integrate STT into HA without using Wyoming. I don't see such integration option.

Same applies to TTS, so I think a solution that provides multiple APIs to access the service is a great idea.

At the end of the day, all I'm trying to achieve is a single point of failure... I mean single instance of a service to save on VRAM usage as the GPU is also loaded up with Ollama and Frigate TensorRT. 😁

0 replies

roryeckel · 2025-02-09T20:39:39Z

roryeckel
Feb 9, 2025
Maintainer

I think there's no way to integrate STT into HA without using Wyoming. I don't see such integration option.

This project is your "missing piece" to get a Wyoming server out of an OpenAI compliant endpoint. Speaches and FastAPI-Kokoro can both serve this type of OpenAI endpoint. Then, you would put the IP of this wyoming_openai container into Home Assistant to access it via the proxy

I think I need to improve the documentation on how to use it with Home Assistant. I am open to contributions as well. Thanks

0 replies

mateuszdrab · 2025-02-09T21:54:55Z

mateuszdrab
Feb 9, 2025
Author

Trying it now, first little docker compose fix already in PR ;)

0 replies

mateuszdrab · 2025-02-10T01:28:25Z

mateuszdrab
Feb 10, 2025
Author

All is working initially
Open WebUI -> Speeches and Home Assistant -> wyoming_openai -> Speeches
Had to add NVIDIA_DISABLE_REQUIRE=1 to speeches envs because I'm forced to run a CUDA 12.2 only driver.

0 replies

mateuszdrab · 2025-02-10T14:30:41Z

mateuszdrab
Feb 10, 2025
Author

Mind moving this into a discussion?
I have some ideas and suggestions to discuss - or we can do it here.

0 replies

mateuszdrab · 2025-02-10T14:34:23Z

mateuszdrab
Feb 10, 2025
Author

I set my TTS_VOICES to "af af_bella af_sarah am_adam am_michael bf_emma bf_isabella bm_george bm_lewis af_nicole af_sky" which corresponds to the options I get in the speaches web-ui.

Are you seeing the same?

11 replies

mateuszdrab Apr 6, 2025
Author

Hey @roryeckel

I wanted to bring this matter up again because I know you fixed it through dockerfile, but it does not address the behavior of the application so it is not relevant much in my environment.

I run wyoming-openai in kubernetes on a node with GPU access, this is also where speaches runs.

As you might imagine, the pods restart during node reboot, and I always end up with the default list of voices because Speaches takes much longer to start.

In order to handle it more Kubernetes native, it would be better if wyoming-openai had an option to crash when it can't obtain the voices to allow kubelet to restart it with a small back-off period.

What do you think?

roryeckel Apr 6, 2025
Maintainer

In order to handle it more Kubernetes native, it would be better if wyoming-openai had an option to crash when it can't obtain the voices to allow kubelet to restart it with a small back-off period.

Would an "init container" or "readiness probe" be suitable for this job? Can we apply a similar solution to depends_on using those concepts? Also, any kube example config that I should include in this repo?

mateuszdrab Apr 6, 2025
Author

True, I've added an init container for now which should reduce the likelihood of this happening.
I still think crashing might be a good idea, I think the init container or depends on approach is more of a workaround which I'd employ if I wasn't able to directly modify the affected code.

My Kubernetes manifest is rather simple - feel free to add this into the repo it:
It also requires speaches to be available in the same namespace.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wyoming-openai
spec:
  replicas: 1
  selector:
    matchLabels:
      app: wyoming-openai
  template:
    metadata:
      labels:
        app: wyoming-openai
    spec:
      initContainers:
        - name: init-wyoming-openai
          image: curlimages/curl
          command: ["/bin/sh", "-c"]
          args:
            - |
              set -x;
              while [ $(curl -sw '%{http_code}' "http://speaches:8000/" -o /dev/null) -ne 200 ]; do
                sleep 15;
              done
              echo "Speaches is up!"
      containers:
        - name: wyoming-openai
          image: ghcr.io/roryeckel/wyoming_openai:main
          ports:
            - containerPort: 10300
          env:
            - name: WYOMING_URI
              value: "tcp://0.0.0.0:10300"
            - name: WYOMING_LOG_LEVEL
              value: "INFO"
            - name: WYOMING_LANGUAGES
              value: "en"
            - name: STT_OPENAI_URL
              value: "http://speaches:8000/v1"
            - name: STT_MODELS
              value: "Systran/faster-distil-whisper-large-v3"
            - name: TTS_OPENAI_URL
              value: "http://speaches:8000/v1"
            - name: TTS_MODELS
              value: "hexgrad/Kokoro-82M"
---
apiVersion: v1
kind: Service
metadata:
  name: wyoming-openai
spec:
  selector:
    app: wyoming-openai
  ports:
    - protocol: TCP
      port: 10300
      targetPort: 10300
  type: LoadBalancer #Might need nodeport if you don't have a functional load balancer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: huggingface-hub-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: local-path #Use your own storage class or perhaps bind mount?
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: speaches
spec:
  replicas: 1
  selector:
    matchLabels:
      app: speaches
  template:
    metadata:
      labels:
        app: speaches
    spec:
      initContainers:
        - name: init-models
          image: ghcr.io/speaches-ai/speaches:latest-cuda
          command: ["sh", "-c"]
          env:
            - name: NVIDIA_DISABLE_REQUIRE
              value: "1"
            - name: NVIDIA_VISIBLE_DEVICES
              value: all
          args:
            - |
              export KOKORO_REVISION=c97b7bbc3e60f447383c79b2f94fee861ff156ac
              huggingface-cli download hexgrad/Kokoro-82M --include 'kokoro-v0_19.onnx' --revision $KOKORO_REVISION
              curl --location --output /home/ubuntu/.cache/huggingface/hub/models--hexgrad--Kokoro-82M/snapshots/$KOKORO_REVISION/voices.bin https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.bin
          volumeMounts:
            - mountPath: /home/ubuntu/.cache/huggingface/hub
              name: huggingface-hub-vol
      containers:
        - name: speaches
          image: ghcr.io/speaches-ai/speaches:latest-cuda
          ports:
            - containerPort: 8000
          livenessProbe:
            httpGet:
              port: 8000
            timeoutSeconds: 5
            failureThreshold: 6
          env:
            - name: log_level
              value: "info"
            - name: WHISPER__MODEL
              value: "Systran/faster-distil-whisper-large-v3"
            - name: WHISPER__compute_type
              value: "int8_float32"
            - name: WHISPER__TTL
              value: "86400"
            - name: NVIDIA_DISABLE_REQUIRE
              value: "1"
            - name: NVIDIA_VISIBLE_DEVICES
              value: all
            - name: LOOPBACK_HOST_URL
              value: "http://localhost:8000"
          volumeMounts:
            - mountPath: /home/ubuntu/.cache/huggingface/hub
              name: huggingface-hub-vol
      volumes:
        - name: huggingface-hub-vol
          persistentVolumeClaim:
            claimName: huggingface-hub-pvc
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "worker-role"
          value: "gpu"
      runtimeClassName: nvidia
      nodeSelector:
        nvidia.com/gpu.present: "true"
---
apiVersion: v1
kind: Service
metadata:
  name: speaches
spec:
  selector:
    app: speaches
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000
  type: LoadBalancer
---

roryeckel Apr 6, 2025
Maintainer

Idea: environment variable for backend type. Default will be autodetect. If backend type is official OpenAI (after autodetection stage), fall back to default OpenAI tts-1 voices. If backend type is a local alternative and fails to retrieve the voice list, throw error. Compose examples should be updated to explicitly specify the environment variable to receive the benefits of this change. Also, that is a cool manifest.

roryeckel Apr 9, 2025
Maintainer

@mateuszdrab I just finished implementing that, you can specify TTS_BACKEND/STT_BACKEND 5fc4a2b
https://github.com/roryeckel/wyoming_openai?tab=readme-ov-file#table-of-environment-variables-for-command-line-arguments
https://github.com/roryeckel/wyoming_openai/releases/tag/v0.1.1

roryeckel · 2025-02-10T23:40:17Z

roryeckel
Feb 10, 2025
Maintainer

Looks like I copied the voice list from Kokoro 1.0, but Speaches is on 0.19 at the moment. I'll downgrade / make something to fix that.

2 replies

mateuszdrab Feb 13, 2025
Author

Looks like Kokoro 1.0 is available to download from https://github.com/thewh1teagle/kokoro-onnx but I haven't tried it yet

roryeckel Feb 13, 2025
Maintainer

Looks like Kokoro 1.0 is available to download from https://github.com/thewh1teagle/kokoro-onnx but I haven't tried it yet

That version 1.0 is available now in FastAPI-Kokoro but not yet in Speaches. I have tested the new voices, it's cool

roryeckel · 2025-02-11T00:45:06Z

roryeckel
Feb 11, 2025
Maintainer

9e686b2

I am hesitant to add the unofficial /v1/audio/speech/voices endpoint as it may throw off the official OpenAI support. I hope OpenAI improves this voice listing capability in the future. Just added a comment with where to find the voices for now.

3 replies

mateuszdrab Feb 11, 2025
Author

What would be nice is if the variable wasn't needed and the models were pulled from speaches, I guess this is the unofficial models APIs you're talking about?

roryeckel Feb 11, 2025
Maintainer

Correct, it is not officially OpenAI supported. Perhaps I could some detection capabilities for these unofficial extensions of various projects and subclass / swap out the official OpenAI client

mateuszdrab Feb 11, 2025
Author

Correct, it is not officially OpenAI supported. Perhaps I could some detection capabilities for these unofficial extensions of various projects and subclass / swap out the official OpenAI client

You could probe the unofficial endpoint automatically and gracefully fail back to variable provided list if it fails or perhaps only probe if the list is empty or perhaps the value set to something like auto.

mateuszdrab · 2025-02-11T01:07:23Z

mateuszdrab
Feb 11, 2025
Author

One extra thing I think would be beneficial is a pre-built container image as there's a Dockerfile already.
At some point I'm going to move this and speaches to Kubernetes so no docker compose for me.

Do you mind adding an action for building a container image and rolling out v0.1 (v.0.0.1) at some point?

No rush with it, just wondering.

3 replies

roryeckel Feb 11, 2025
Maintainer

Yes I am working on it #5

roryeckel Feb 11, 2025
Maintainer

That has been created, still todo on first release + support for "latest" tag #6

mateuszdrab Feb 11, 2025
Author

Thanks, will try it later.
My idea is that you could have an image for the main branch, plus one for each release.
Latest should point to the latest release but atm could point to the main branch.

Uh oh!

Ideas & improvements #4

Uh oh!

mateuszdrab Feb 9, 2025

Replies: 12 comments · 19 replies

Uh oh!

roryeckel Feb 9, 2025 Maintainer

Uh oh!

Uh oh!

roryeckel Feb 9, 2025 Maintainer

Uh oh!

roryeckel Feb 9, 2025 Maintainer

Uh oh!

mateuszdrab Feb 9, 2025 Author

Uh oh!

Uh oh!

roryeckel Feb 9, 2025 Maintainer

Uh oh!

mateuszdrab Feb 9, 2025 Author

Uh oh!

mateuszdrab Feb 10, 2025 Author

Uh oh!

mateuszdrab Feb 10, 2025 Author

Uh oh!

mateuszdrab Feb 10, 2025 Author

Uh oh!

mateuszdrab Apr 6, 2025 Author

Uh oh!

Uh oh!

roryeckel Apr 6, 2025 Maintainer

Uh oh!

mateuszdrab Apr 6, 2025 Author

Uh oh!

roryeckel Apr 6, 2025 Maintainer

Uh oh!

Uh oh!

roryeckel Apr 9, 2025 Maintainer

Uh oh!

Uh oh!

roryeckel Feb 10, 2025 Maintainer

Uh oh!

mateuszdrab Feb 13, 2025 Author

Uh oh!

Uh oh!

roryeckel Feb 13, 2025 Maintainer

Uh oh!

Uh oh!

roryeckel Feb 11, 2025 Maintainer

Uh oh!

mateuszdrab Feb 11, 2025 Author

Uh oh!

roryeckel Feb 11, 2025 Maintainer

Uh oh!

mateuszdrab Feb 11, 2025 Author

Uh oh!

mateuszdrab Feb 11, 2025 Author

Uh oh!

Uh oh!

roryeckel Feb 11, 2025 Maintainer

Uh oh!

Uh oh!

roryeckel Feb 11, 2025 Maintainer

Uh oh!

Uh oh!

mateuszdrab
Feb 9, 2025

Replies: 12 comments 19 replies

roryeckel
Feb 9, 2025
Maintainer

roryeckel
Feb 9, 2025
Maintainer

roryeckel
Feb 9, 2025
Maintainer

mateuszdrab
Feb 9, 2025
Author

roryeckel
Feb 9, 2025
Maintainer

mateuszdrab
Feb 9, 2025
Author

mateuszdrab
Feb 10, 2025
Author

mateuszdrab
Feb 10, 2025
Author

mateuszdrab
Feb 10, 2025
Author

mateuszdrab Apr 6, 2025
Author

roryeckel Apr 6, 2025
Maintainer

mateuszdrab Apr 6, 2025
Author

roryeckel Apr 6, 2025
Maintainer

roryeckel Apr 9, 2025
Maintainer

roryeckel
Feb 10, 2025
Maintainer

mateuszdrab Feb 13, 2025
Author

roryeckel Feb 13, 2025
Maintainer

roryeckel
Feb 11, 2025
Maintainer

mateuszdrab Feb 11, 2025
Author

roryeckel Feb 11, 2025
Maintainer

mateuszdrab Feb 11, 2025
Author

mateuszdrab
Feb 11, 2025
Author

roryeckel Feb 11, 2025
Maintainer

roryeckel Feb 11, 2025
Maintainer