Added support for OpenAI Text to Audio (Speech API ) #317

hemeda3 · 2024-02-12T08:26:54Z

Support Added support for OpenAI Text to Audio (Speech API ) with streaming support
https://platform.openai.com/docs/api-reference/audio/createSpeech#:~:text=Speech%20to%20text-,Create%20speech,-

how to use it :

	@Autowired
	public OpenAiAudioSpeechClient openAiAudioSpeechClient;

       byte[] responseAsBytes = openAiAudioSpeechClient.call("Hello, world!");

config:

          # OpenAI API configuration
          spring.ai.openai.api-key=your_api_key
          spring.ai.openai.base-url=https://api.openai.com
          
          # Speech synthesis options
          spring.ai.openai.audio.speech.options.model=tts-1
          spring.ai.openai.audio.speech.options.voice=alloy
          spring.ai.openai.audio.speech.options.response-format=mp3
          spring.ai.openai.audio.speech.options.speed=0.75

Manual options with metadata/ratelimit info and prompt style

	
      OpenAiAudioSpeechOptions.builder().withSpeed(0.25f).withModel(OpenAiAudioApi.TtsModel.TTS_1.value).build();
      SpeechPrompt speechPrompt = new SpeechPrompt("Hello, world!", options);
      SpeechResponse responseWithMetaData = openAiAudioSpeechClient.call(speechPrompt);
     OpenAiAudioSpeechResponseMetadata metadata = responseWithMetaData.getMetadata(); // rate limit info 

      byte[] responseAsBytes = responseWithMetaData.getResult().getOutput();

Streaming speech Audio directly from OpenAI API

OpenAiAudioSpeechOptions speechOptions = OpenAiAudioSpeechOptions.builder()
			.withVoice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
			.withSpeed(SPEED)
			.withResponseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
			.withModel(OpenAiAudioApi.TtsModel.TTS_1.value)
			.build();
		SpeechPrompt speechPrompt = new SpeechPrompt("Today is a wonderful day to build something people love!",
				speechOptions);
		Flux<SpeechResponse> response = openAiAudioSpeechClient.stream(speechPrompt);

hemeda3 · 2024-03-05T22:10:40Z

@tzolov seems you worked on the speech API code and merged to main, should I close this PR ?

tzolov · 2024-03-06T11:27:19Z

Hi @hemeda3 , thanks for reaching out.
I've worked on and merged the #300 PR.
In the process I realised that the low-level API is scattered between the #300 and your #317 PRs. Additionally it was not completely covering the underlying OpenAI Audio API spec.
So I decided to adopt an old implementation I did for my Assistant AI explorations.

Next I realised that that until we have at least two text-to-speech and speech-to-text client implementations from different AI vendors, it is premature to create a common model abstractions under the spring-ai-core/model. Later are meant to facilitate portability between vendors, but with a single implementation there is not enough data to decide what the common abstractions should look like.
Therefore i've moved the Audio Transcription prompt / response and alike inside the spring-ai-openai project (under the audio/transcription package). Later when we have more audio clients we can decide how to abstract those back to the core.
Finally as you can see the #300 implements client for the transcription endpoint, while your PR is adding speech generation client.

Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package?
I would really appreciate your help. Do not hesitate to ask or suggest improvements.

hemeda3 · 2024-03-06T22:53:16Z

Having said this, would you be interested to re-work your PR after the refactoring i did? You will have to base your client on the OpenAiAudiApi low-level client and move the code form spring-ai-core to the spring-ai-openai .../audio/speech (e.g. next to .../audio/transcription) package? I would really appreciate your help. Do not hesitate to ask or suggest improvements.

@tzolov, thanks for the explanation 🙏. actually I was a bit confused since both APIs ( speech + transcription) share the same OpenAI audio API at a low level, but your changes have clarified things for me. I'm happy to re-work my PR based on your updates. If I have any questions, I'll reach out. Thanks for the opportunity to contribute and learn.

hemeda3 · 2024-03-07T05:08:01Z

Rebased the client on the OpenAiAudiApi low-level
Moved the code form spring-ai-core to the spring-ai-openai .../audio/speech
added stream support (open ai speech response can be received as stream)
added metadata/ratelimit using WebClient
added tests for speech properties + stream/call methods
updated issue description with thew new usage

hemeda3 · 2024-03-11T23:14:18Z

Hi @tzolov should I add the documentation to the same PR or new PR?

tzolov · 2024-03-12T14:10:38Z

Hi @hemeda3 , thanks for asking.
Sure it would be nice to add the docs too.

hemeda3 · 2024-03-17T00:19:55Z

Added speech API adoc:

updated nav.adoc
added speech.adoc

markpollack · 2024-03-20T13:51:05Z

.../spring-ai-openai/src/main/java/org/springframework/ai/openai/audio/speech/SpeechPrompt.java

+
+	private OpenAiAudioSpeechOptions speechOptions;
+
+	private final List<SpeechMessage> messages;


The underlying API only accepts a single string input, not an collection. So I think this should be ModelRequest<SpeechMessage> and not ModelRequest<List>.

markpollack · 2024-03-20T17:09:28Z

It took a while to get to, but it is now merged. Thanks, this was a great contribution!

Merged as 766b420

hemeda3 force-pushed the speech branch 3 times, most recently from d234402 to 43d2390 Compare February 12, 2024 08:33

hemeda3 mentioned this pull request Feb 12, 2024

Adding support for transcriptions ( and OpenAI Whisper model) #300

Closed

hemeda3 force-pushed the speech branch 2 times, most recently from 8a79494 to 2ce21fb Compare February 12, 2024 08:38

markpollack added this to the 0.9.0 milestone Feb 12, 2024

hemeda3 force-pushed the speech branch 2 times, most recently from 8039a5f to 4e4989c Compare February 15, 2024 00:42

markpollack modified the milestones: 0.9.0, 0.8.1 Feb 29, 2024

markpollack self-assigned this Feb 29, 2024

hemeda3 force-pushed the speech branch from 4e4989c to 848100d Compare February 29, 2024 21:52

tzolov added enhancement New feature or request model client labels Mar 1, 2024

hemeda3 force-pushed the speech branch 3 times, most recently from 9198015 to eb766b1 Compare March 5, 2024 21:57

hemeda3 force-pushed the speech branch 8 times, most recently from 4d6a718 to 684c81a Compare March 7, 2024 04:46

hemeda3 force-pushed the speech branch 3 times, most recently from b971a8e to 3a2edc0 Compare March 7, 2024 22:06

markpollack modified the milestones: 0.8.1, 1.0.0-M1 Mar 14, 2024

hemeda3 force-pushed the speech branch 3 times, most recently from 14849b6 to 5de7b4a Compare March 17, 2024 00:11

Added support for OpenAI Text to Audio (Speech API )

6393b21

hemeda3 force-pushed the speech branch from 5de7b4a to 6393b21 Compare March 17, 2024 00:13

markpollack reviewed Mar 20, 2024

View reviewed changes

markpollack closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for OpenAI Text to Audio (Speech API ) #317

Added support for OpenAI Text to Audio (Speech API ) #317

Uh oh!

hemeda3 commented Feb 12, 2024 •

edited

Loading

Uh oh!

hemeda3 commented Mar 5, 2024

Uh oh!

tzolov commented Mar 6, 2024 •

edited

Loading

Uh oh!

hemeda3 commented Mar 6, 2024

Uh oh!

hemeda3 commented Mar 7, 2024 •

edited

Loading

Uh oh!

hemeda3 commented Mar 11, 2024

Uh oh!

tzolov commented Mar 12, 2024

Uh oh!

hemeda3 commented Mar 17, 2024

Uh oh!

markpollack Mar 20, 2024

Uh oh!

markpollack commented Mar 20, 2024

Uh oh!

Uh oh!


		private OpenAiAudioSpeechOptions speechOptions;

		private final List<SpeechMessage> messages;

Added support for OpenAI Text to Audio (Speech API ) #317

Added support for OpenAI Text to Audio (Speech API ) #317

Uh oh!

Conversation

hemeda3 commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hemeda3 commented Mar 5, 2024

Uh oh!

tzolov commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hemeda3 commented Mar 6, 2024

Uh oh!

hemeda3 commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hemeda3 commented Mar 11, 2024

Uh oh!

tzolov commented Mar 12, 2024

Uh oh!

hemeda3 commented Mar 17, 2024

Uh oh!

markpollack Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

markpollack commented Mar 20, 2024

Uh oh!

Uh oh!

hemeda3 commented Feb 12, 2024 •

edited

Loading

tzolov commented Mar 6, 2024 •

edited

Loading

hemeda3 commented Mar 7, 2024 •

edited

Loading