Text-To-Speech - ElevenLabs API ? #707

kaloprojects · 2024-03-23T14:46:54Z

kaloprojects
Mar 23, 2024

Hi schreibfaul1, first of all just want to send my thank you ! .. your library is awesome, that’s exactly what i was looking for. Just building an ESP32 device with STT (using Google Cloud) and TTS (using your 'audio.openai_speech' function), btw: inspired from this project (techiesms GPT Voice assistant, he uses your library): https://www.youtube.com/watch?v=gGcskjKtArU.

As i was searching for most professional TTS solution (background: i do need several self defined custom voices, triggered on ESP32 via I2S with your library). ElevenLabs might be the leader here (more powerful than even Open AI). So i was wondering if you plan to add this API ?.., otherwise i might try myself to clone your 'Audio::openai_speech()' in Audio.ccp, using as template for an additional 'Audio::elevenlabs_speech(..)'. But i have concerns not having the needed skills to do myself. :|

Have your ever seen this awesome API?: https://elevenlabs.io/docs/api-reference/text-to-speech (also found this video which describes the capabilities of this TTS: https://www.youtube.com/watch?v=z0sD2BvUfM0).

Do you plan to add this call into your Audio.ccp ? (or should i try myself to test url & tags via Postman first then building an Function inside a local Audio.cpp and come back on questions in case i struggle)

Any feedback much appreciated, thank you !

schreibfaul1 · 2024-03-24T17:56:16Z

schreibfaul1
Mar 24, 2024
Maintainer

Hello @kalodj, I have watched the YouTube. To use the service, you have to register there first. If you do this, you can proceed as described in the video. I would like to have a voice service that can be used completely independently.

1 reply

kaloprojects Mar 24, 2024
Author

Hi ;) .. thanks for the fast response. Yes you need a registration always for getting the API key (but that is same as in Open AI and in the Audio::openai-speech() in your library, right ?). But you do not need to pay anything for basic features (which are already far beyond Google Cloud, Open AI any any other professional services i checked). Just register for free to get the key, then you can access most Pro features in API reference (voices library, voice customizing, multilingual etc..). Btw: this did surprise myself (because all other vendor asking for payment, and often expensive). I will test it next with free account, i can let you know. ok ?.

add.on: But i totally got your point: company independent solutions are always best, user who do not want to register at all can still use the limited audio.connectospeech()

simon-archer · 2024-04-23T11:49:41Z

simon-archer
Apr 23, 2024

Hey, I'm working on this now but not using this library. I'm still curious if there are any plans of adding support from streaming a chunked response from the elevenlabs API streamed back as raw 16khz pcm data. Trying to implement it myself, will share here if I succeed.

3 replies

kaloprojects Apr 23, 2024
Author

cool :) .. would be awesome if you succeed. I am using this the OpenAI call in schreibfaul1 library currently (audio.openai_speech()), but the Elevenlabs would be the winner for sure, so much more flexible (bc. i can create my custom voices,, which OpenAI still does not support).

Btw: i stopped to use the 'audio.connecttospeech()' via Google TTS at all, very robotic voice (and most reason, I tested hours: always fails on larger texts). So OpenAI is my approach in moment ..

.. but exited looking forward your elevenlabs solution (whatever library you use), keep us posted, and let us know ! THX in prepare :)

simon-archer Apr 23, 2024

Yeah, it's more challenging managing the buffers than I thought. I also tried with OpenAI, but did not get the streaming to work. Do you have your code published somewhere I could view it?

kaloprojects Apr 25, 2024
Author

Hi Simon, I hope you will manage at end (crossing my finger ;).
To your question (my code): You can even use the example code from schreibfaul1 on his main page (1:1): https://github.com/schreibfaul1/ESP32-audioI2S. Just replace the Google TTS with Open-AI:
' audio.connecttospeech("Wenn die Hunde ...", "de");' -->
' audio.openai_speech("your OPENAI_KEY", "tts-1", "Wenn die Hunde ...", "alloy", "mp3", "1");

But let me post below a snipped from my code. Insert your 3 credentials in header and complle.
Your ESP32 would speak "OK" on each RESET.

fyi, maybe of interest (to understand my code). I am using a more tricky workflow, my project (AI assistant) uses two ESP32 (based on techiesms project):,

one is handling all logic in user communication (enter question -> calling OpenAI chatgpt API -> receiving answer) and
the 2nd ESP32 (where we speak here about) is used as an pure (and flexible) Text -> Speech device, they both communicate via Serial2 Tx/Rx, works great (allowing me to keep it clean and working fast in parralell). Btw: the first ESP (the OpenAI chatgpt user part needs a Speech-To-Text component next, currently in testing (btw: i would love schreibfaul1 library would offer such an OpenAI STT API call, using on ESP32 with I2S microphone/stream because all existing STT i found are old fashioned and based on years old Google Cloud STT Api and no longer compatible with current ESP32 2.0.14 and later libraries).. but that's a different topic for later).. let's realize the ElevenLabs TTS call first (most important :).

Last not least, i learned 2 issue to get all worked, maybe helps:

for whatever reason some user (same me, using a actual 38 pin NodeMCU ESP32) got lot of compile error on schreibfauls1 library, the solution to solve: try the ESP32 library 2.0.14 !! (maybe try once if you have same issue, see here: Audio.h #669
avoid any \r (CR) in the string sending to OpenAI, otherwise no audio at all (see my code)

kaloprojects · 2024-04-25T09:49:19Z

kaloprojects
Apr 25, 2024
Author

here my code in case interested (i do not know how to add a ino code snippet here, so i upload as zip:
kalo_TTS_Demo.zip

0 replies

kaloprojects · 2024-05-07T18:18:19Z

kaloprojects
May 7, 2024
Author

Hi Simon, minor update: I researched a lot in past, i might not use Elevenlabs as it requires a monthly abo payment. One of the best alternatives i found meanwhile is SpeechGenIO, pretty unknown but awesome voices. Also payment needed but a kind on single basket (for a defined volume, which makes sense IMHO. I will use this for sure (in parrallel to the 6 predefined OpenAI voices). In case you are still interested here is the API link, maybe possible to add (similar the Audio::openai_speech()' in Audio.ccp)

0 replies

Text-To-Speech - ElevenLabs API ? #707

Uh oh!

Uh oh!

kaloprojects Mar 23, 2024

Replies: 4 comments · 4 replies

Uh oh!

schreibfaul1 Mar 24, 2024 Maintainer

Uh oh!

Uh oh!

kaloprojects Mar 24, 2024 Author

Uh oh!

simon-archer Apr 23, 2024

Uh oh!

Uh oh!

kaloprojects Apr 23, 2024 Author

Uh oh!

simon-archer Apr 23, 2024

Uh oh!

Uh oh!

kaloprojects Apr 25, 2024 Author

Uh oh!

kaloprojects Apr 25, 2024 Author

Uh oh!

kaloprojects May 7, 2024 Author

kaloprojects
Mar 23, 2024

Replies: 4 comments 4 replies

schreibfaul1
Mar 24, 2024
Maintainer

kaloprojects Mar 24, 2024
Author

simon-archer
Apr 23, 2024

kaloprojects Apr 23, 2024
Author

kaloprojects Apr 25, 2024
Author

kaloprojects
Apr 25, 2024
Author

kaloprojects
May 7, 2024
Author