Skip to content

Commit 4fd15b1

Browse files
committed
Add documentation for OpenAI Transcription
Fixes #401
1 parent ed5eef4 commit 4fd15b1

File tree

11 files changed

+200
-51
lines changed

11 files changed

+200
-51
lines changed

models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiAudioTranscriptionClient.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,11 @@ public class OpenAiAudioTranscriptionClient
6868

6969
private final OpenAiAudioApi audioApi;
7070

71+
/**
72+
* OpenAiAudioTranscriptionClient is a client class used to interact with the OpenAI
73+
* Audio Transcription API.
74+
* @param audioApi The OpenAiAudioApi instance to be used for making API calls.
75+
*/
7176
public OpenAiAudioTranscriptionClient(OpenAiAudioApi audioApi) {
7277
this(audioApi,
7378
OpenAiAudioTranscriptionOptions.builder()
@@ -78,6 +83,25 @@ public OpenAiAudioTranscriptionClient(OpenAiAudioApi audioApi) {
7883
RetryUtils.DEFAULT_RETRY_TEMPLATE);
7984
}
8085

86+
/**
87+
* OpenAiAudioTranscriptionClient is a client class used to interact with the OpenAI
88+
* Audio Transcription API.
89+
* @param audioApi The OpenAiAudioApi instance to be used for making API calls.
90+
* @param options The OpenAiAudioTranscriptionOptions instance for configuring the
91+
* audio transcription.
92+
*/
93+
public OpenAiAudioTranscriptionClient(OpenAiAudioApi audioApi, OpenAiAudioTranscriptionOptions options) {
94+
this(audioApi, options, RetryUtils.DEFAULT_RETRY_TEMPLATE);
95+
}
96+
97+
/**
98+
* OpenAiAudioTranscriptionClient is a client class used to interact with the OpenAI
99+
* Audio Transcription API.
100+
* @param audioApi The OpenAiAudioApi instance to be used for making API calls.
101+
* @param options The OpenAiAudioTranscriptionOptions instance for configuring the
102+
* audio transcription.
103+
* @param retryTemplate The RetryTemplate instance for retrying failed API calls.
104+
*/
81105
public OpenAiAudioTranscriptionClient(OpenAiAudioApi audioApi, OpenAiAudioTranscriptionOptions options,
82106
RetryTemplate retryTemplate) {
83107
Assert.notNull(audioApi, "OpenAiAudioApi must not be null");

models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiAudioTranscriptionOptions.java

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,7 @@ public class OpenAiAudioTranscriptionOptions implements ModelOptions {
3838
private @JsonProperty("model") String model;
3939

4040
/**
41-
* An object specifying the format that the model must output. Setting to { "type":
42-
* "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.
41+
* The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
4342
*/
4443
private @JsonProperty("response_format") TranscriptResponseFormat responseFormat;
4544

spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,6 @@
22
* xref:concepts.adoc[AI Concepts]
33
* xref:getting-started.adoc[Getting Started]
44
* xref:api/index.adoc[]
5-
** xref:api/embeddings.adoc[]
6-
*** xref:api/embeddings/openai-embeddings.adoc[OpenAI]
7-
*** xref:api/embeddings/ollama-embeddings.adoc[Ollama]
8-
*** xref:api/embeddings/azure-openai-embeddings.adoc[Azure OpenAI]
9-
*** xref:api/embeddings/postgresml-embeddings.adoc[PostgresML]
10-
*** xref:api/embeddings/vertexai-embeddings.adoc[Google VertexAI PaLM2]
11-
*** xref:api/bedrock.adoc[Amazon Bedrock]
12-
**** xref:api/embeddings/bedrock-cohere-embedding.adoc[Cohere]
13-
**** xref:api/embeddings/bedrock-titan-embedding.adoc[Titan]
14-
*** xref:api/embeddings/onnx.adoc[Transformers (ONNX)]
15-
*** xref:api/embeddings/mistralai-embeddings.adoc[Mistral AI]
165
** xref:api/chatclient.adoc[]
176
*** xref:api/clients/openai-chat.adoc[OpenAI]
187
**** xref:api/clients/functions/openai-chat-functions.adoc[Function Calling]
@@ -31,9 +20,22 @@
3120
***** xref:api/clients/functions/vertexai-gemini-chat-functions.adoc[Function Calling]
3221
*** xref:api/clients/mistralai-chat.adoc[Mistral AI]
3322
**** xref:api/clients/functions/mistralai-chat-functions.adoc[Function Calling]
23+
** xref:api/embeddings.adoc[]
24+
*** xref:api/embeddings/openai-embeddings.adoc[OpenAI]
25+
*** xref:api/embeddings/ollama-embeddings.adoc[Ollama]
26+
*** xref:api/embeddings/azure-openai-embeddings.adoc[Azure OpenAI]
27+
*** xref:api/embeddings/postgresml-embeddings.adoc[PostgresML]
28+
*** xref:api/embeddings/vertexai-embeddings.adoc[Google VertexAI PaLM2]
29+
*** xref:api/bedrock.adoc[Amazon Bedrock]
30+
**** xref:api/embeddings/bedrock-cohere-embedding.adoc[Cohere]
31+
**** xref:api/embeddings/bedrock-titan-embedding.adoc[Titan]
32+
*** xref:api/embeddings/onnx.adoc[Transformers (ONNX)]
33+
*** xref:api/embeddings/mistralai-embeddings.adoc[Mistral AI]
3434
** xref:api/imageclient.adoc[]
3535
*** xref:api/clients/image/openai-image.adoc[OpenAI]
3636
*** xref:api/clients/image/stabilityai-image.adoc[Stability]
37+
** xref:api/transcriptions.adoc[]
38+
*** xref:api/transcriptions/openai-transcriptions.adoc[OpenAI]
3739
** xref:api/vectordbs.adoc[]
3840
*** xref:api/vectordbs/azure.adoc[]
3941
*** xref:api/vectordbs/chroma.adoc[]

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/clients/bedrock/bedrock-titan.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= Titan Chat
22

3-
link:https://aws.amazon.com/bedrock/titan/[Amazon Titan] foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API.
3+
link:https://aws.amazon.com/bedrock/titan/[Amazon Titan] foundation models (FMs) provide customers with a breadth of high-performing image, multimodal embeddings, and text model choices, via a fully managed API.
44
Amazon Titan models are created by AWS and pretrained on large datasets, making them powerful, general-purpose models built to support a variety of use cases, while also supporting the responsible use of AI.
55
Use them as is or privately customize them with your own data.
66

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/clients/image/openai-image.adoc

Lines changed: 28 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= OpenAI Image Generation
22

33

4-
Spring AI supports ChatGPT's DALL-E, the Image generation model from OpenAI.
4+
Spring AI supports DALL-E, the Image generation model from OpenAI.
55

66
== Prerequisites
77

@@ -41,20 +41,24 @@ TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Man
4141

4242
=== Image Generation Properties
4343

44-
==== Retry Properties
4544

46-
The prefix `spring.ai.retry` is used as the property prefix that lets you configure the retry mechanism for the OpenAI Image client.
45+
The prefix `spring.ai.openai.image` is the property prefix that lets you configure the `ImageClient` implementation for OpenAI.
4746

4847
[cols="3,5,1"]
4948
|====
5049
| Property | Description | Default
51-
52-
| spring.ai.retry.max-attempts | Maximum number of retry attempts. | 10
53-
| spring.ai.retry.backoff.initial-interval | Initial sleep duration for the exponential backoff policy. | 2 sec.
54-
| spring.ai.retry.backoff.multiplier | Backoff interval multiplier. | 5
55-
| spring.ai.retry.backoff.max-interval | Maximum backoff duration. | 3 min.
56-
| spring.ai.retry.on-client-errors | If false, throw a NonTransientAiException, and do not attempt retry for `4xx` client error codes | false
57-
| spring.ai.retry.exclude-on-http-codes | List of HTTP status codes that should not trigger a retry (e.g. to throw NonTransientAiException). | empty
50+
| spring.ai.openai.image.enabled | Enable OpenAI image client. | true
51+
| spring.ai.openai.image.base-url | Optional overrides the spring.ai.openai.base-url to provide chat specific url | -
52+
| spring.ai.openai.image.api-key | Optional overrides the spring.ai.openai.api-key to provide chat specific api-key | -
53+
| spring.ai.openai.image.options.n | The number of images to generate. Must be between 1 and 10. For dall-e-3, only n=1 is supported. | -
54+
| spring.ai.openai.image.options.model | The model to use for image generation. | OpenAiImageApi.DEFAULT_IMAGE_MODEL
55+
| spring.ai.openai.image.options.quality | The quality of the image that will be generated. HD creates images with finer details and greater consistency across the image. This parameter is only supported for dall-e-3. | -
56+
| spring.ai.openai.image.options.response_format | The format in which the generated images are returned. Must be one of URL or b64_json. | -
57+
| `spring.ai.openai.image.options.size` | The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. | -
58+
| `spring.ai.openai.image.options.size_width` | The width of the generated images. Must be one of 256, 512, or 1024 for dall-e-2. | -
59+
| `spring.ai.openai.image.options.size_height`| The height of the generated images. Must be one of 256, 512, or 1024 for dall-e-2. | -
60+
| `spring.ai.openai.image.options.style` | The style of the generated images. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This parameter is only supported for dall-e-3. | -
61+
| `spring.ai.openai.image.options.user` | A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. | -
5862
|====
5963

6064
==== Connection Properties
@@ -70,32 +74,31 @@ The prefix `spring.ai.openai` is used as the property prefix that lets you conne
7074

7175
==== Configuration Properties
7276

73-
The prefix `spring.ai.openai.image` is the property prefix that lets you configure the `ImageClient` implementation for OpenAI.
77+
78+
==== Retry Properties
79+
80+
The prefix `spring.ai.retry` is used as the property prefix that lets you configure the retry mechanism for the OpenAI Image client.
7481

7582
[cols="3,5,1"]
7683
|====
7784
| Property | Description | Default
78-
| spring.ai.openai.image.enabled | Enable OpenAI image client. | true
79-
| spring.ai.openai.image.base-url | Optional overrides the spring.ai.openai.base-url to provide chat specific url | -
80-
| spring.ai.openai.image.api-key | Optional overrides the spring.ai.openai.api-key to provide chat specific api-key | -
81-
| spring.ai.openai.image.options.n | The number of images to generate. Must be between 1 and 10. For dall-e-3, only n=1 is supported. | -
82-
| spring.ai.openai.image.options.model | The model to use for image generation. | OpenAiImageApi.DEFAULT_IMAGE_MODEL
83-
| spring.ai.openai.image.options.quality | The quality of the image that will be generated. HD creates images with finer details and greater consistency across the image. This parameter is only supported for dall-e-3. | -
84-
| spring.ai.openai.image.options.response_format | The format in which the generated images are returned. Must be one of URL or b64_json. | -
85-
| `spring.ai.openai.image.options.size` | The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models. | -
86-
| `spring.ai.openai.image.options.size_width` | The width of the generated images. Must be one of 256, 512, or 1024 for dall-e-2. | -
87-
| `spring.ai.openai.image.options.size_height`| The height of the generated images. Must be one of 256, 512, or 1024 for dall-e-2. | -
88-
| `spring.ai.openai.image.options.style` | The style of the generated images. Must be one of vivid or natural. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This parameter is only supported for dall-e-3. | -
89-
| `spring.ai.openai.image.options.user` | A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. | -
85+
86+
| spring.ai.retry.max-attempts | Maximum number of retry attempts. | 10
87+
| spring.ai.retry.backoff.initial-interval | Initial sleep duration for the exponential backoff policy. | 2 sec.
88+
| spring.ai.retry.backoff.multiplier | Backoff interval multiplier. | 5
89+
| spring.ai.retry.backoff.max-interval | Maximum backoff duration. | 3 min.
90+
| spring.ai.retry.on-client-errors | If false, throw a NonTransientAiException, and do not attempt retry for `4xx` client error codes | false
91+
| spring.ai.retry.exclude-on-http-codes | List of HTTP status codes that should not trigger a retry (e.g. to throw NonTransientAiException). | empty
9092
|====
9193

92-
=== Image Options [[image-options]]
94+
95+
== Runtime Options [[image-options]]
9396

9497
The https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/OpenAiImageOptions.java[OpenAiImageOptions.java] provides model configurations, such as the model to use, the quality, the size, etc.
9598

9699
On start-up, the default options can be configured with the `OpenAiImageClient(OpenAiImageApi openAiImageApi)` constructor and the `withDefaultOptions(OpenAiImageOptions defaultOptions)` method. Alternatively, use the `spring.ai.openai.image.options.*` properties described previously.
97100

98-
At run-time you can override the default options by adding new, request specific, options to the `ImagePrompt` call.
101+
At runtime you can override the default options by adding new, request specific, options to the `ImagePrompt` call.
99102
For example to override the OpenAI specific options such as quality and the number of images to create, use the following code example:
100103

101104
[source,java]

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/clients/image/stabilityai-image.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ The https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-stab
8080

8181
On start-up, the default options can be configured with the `StabilityAiImageClient(StabilityAiApi stabilityAiApi, StabilityAiImageOptions options)` constructor. Alternatively, use the `spring.ai.openai.image.options.*` properties described previously.
8282

83-
At run-time you can override the default options by adding new, request specific, options to the `ImagePrompt` call.
83+
At runtime, you can override the default options by adding new, request specific, options to the `ImagePrompt` call.
8484
For example to override the Stability AI specific options such as quality and the number of images to create, use the following code example:
8585

8686
[source,java]

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/clients/openai-chat.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ Flux<ChatCompletionChunk> streamResponse = openAiApi.chatCompletionStream(
262262

263263
Follow the https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/api/OpenAiApi.java[OpenAiApi.java]'s JavaDoc for further information.
264264

265-
==== OpenAiApi Samples
265+
== Example Code
266266
* The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/api/OpenAiApiIT.java[OpenAiApiIT.java] test provides some general examples how to use the lightweight library.
267267

268268
* The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/api/tool/OpenAiApiToolFunctionCallIT.java[OpenAiApiToolFunctionCallIT.java] test shows how to use the low-level API to call tool functions.

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/clients/vertexai-gemini-chat.adoc

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
= VertexAI Gemini Chat
22

3-
4-
53
The https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/overview[Vertex AI Gemini API] allows developers to build generative AI applications using the Gemini model.
64
The Vertex AI Gemini API supports multimodal prompts as input and output text or code.
75
A multimodal model is a model that is capable of processing information from multiple modalities, including images, videos, and text. For example, you can send the model a photo of a plate of cookies and ask it to give you a recipe for those cookies.
@@ -79,13 +77,13 @@ The prefix `spring.ai.vertex.ai.gemini.chat` is the property prefix that lets yo
7977

8078
TIP: All properties prefixed with `spring.ai.vertex.ai.gemini.chat.options` can be overridden at runtime by adding a request specific <<chat-options>> to the `Prompt` call.
8179

82-
=== Chat Options [[chat-options]]
80+
== Runtime options [[chat-options]]
8381

8482
The https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-vertex-ai-gemini/src/main/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatOptions.java[VertexAiGeminiChatOptions.java] provides model configurations, such as the temperature, the topK, etc.
8583

8684
On start-up, the default options can be configured with the `VertexAiGeminiChatClient(api, options)` constructor or the `spring.ai.vertex.ai.chat.options.*` properties.
8785

88-
At run-time you can override the default options by adding new, request specific, options to the `Prompt` call.
86+
At runtime you can override the default options by adding new, request specific, options to the `Prompt` call.
8987
For example to override the default temperature for a specific request:
9088

9189
[source,java]
@@ -101,21 +99,21 @@ ChatResponse response = chatClient.call(
10199

102100
TIP: In addition to the model specific `VertexAiChatPaLm2Options` you can use a portable https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/ChatOptions.java[ChatOptions] instance, created with the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/ChatOptionsBuilder.java[ChatOptionsBuilder#builder()].
103101

104-
=== Function Calling
102+
== Function Calling
105103

106104
You can register custom Java functions with the VertexAiGeminiChatClient and have the Gemini Pro model intelligently choose to output a JSON object containing arguments to call one or many of the registered functions.
107105
This is a powerful technique to connect the LLM capabilities with external tools and APIs.
108106
Read more about xref:api/clients/functions/vertexai-gemini-chat-functions.adoc[Vertex AI Gemini Function Calling].
109107

110-
=== Multimodal Example
108+
== Multimodal
111109
Multimodality refers to a model's ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats. This paradigm represents a significant advancement in AI models.
112110

113-
Google's Gemini AI models support this capability by comprehending and integrating text, code, audio, images, and video. For more details, refer to the blog post [Introducing Gemini](https://blog.google/technology/ai/google-gemini-ai/#introducing-gemini).
111+
Google's Gemini AI models support this capability by comprehending and integrating text, code, audio, images, and video. For more details, refer to the blog post https://blog.google/technology/ai/google-gemini-ai/#introducing-gemini[Introducing Gemini].
114112

115113
Spring AI's `Message` interface supports multimodal AI models by introducing the Media type.
116114
This type contains data and information about media attachments in messages, using Spring's `org.springframework.util.MimeType` and a `java.lang.Object` for the raw media data.
117115

118-
Below is a simple code example extracted from [VertexAiGeminiChatClientIT.java](https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-vertex-ai-gemini/src/test/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatClientIT.java), demonstrating the combination of user text with an image.
116+
Below is a simple code example extracted from https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-vertex-ai-gemini/src/test/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatClientIT.java[VertexAiGeminiChatClientIT.java], demonstrating the combination of user text with an image.
119117

120118

121119
[source,java]
@@ -128,7 +126,7 @@ var userMessage = new UserMessage("Explain what do you see o this picture?",
128126
ChatResponse response = chatClient.call(new Prompt(List.of(userMessage)));
129127
----
130128

131-
=== Sample Controller (Auto-configuration)
129+
== Sample Controller
132130

133131
https://start.spring.io/[Create] a new Spring Boot project and add the `spring-ai-vertex-ai-palm2-spring-boot-starter` to your pom (or gradle) dependencies.
134132

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/embeddings/bedrock-titan-embedding.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Titan Embeddings
22

33
Provides Bedrock Titan Embedding client.
4-
link:https://aws.amazon.com/bedrock/titan/[Amazon Titan] foundation models (FMs) provide customers with a breadth of high-performing image, multimodal, and text model choices, via a fully managed API.
4+
link:https://aws.amazon.com/bedrock/titan/[Amazon Titan] foundation models (FMs) provide customers with a breadth of high-performing image, multimodal embeddings, and text model choices, via a fully managed API.
55
Amazon Titan models are created by AWS and pretrained on large datasets, making them powerful, general-purpose models built to support a variety of use cases, while also supporting the responsible use of AI.
66
Use them as is or privately customize them with your own data.
77

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[[Transcription]]
2+
= Transcription API
3+
4+
Spring AI provides support for OpenAI's Transcription API.
5+
When additional providers for Transcription are implemented, a common `AudioTranscriptionClient` interface will be extracted.

0 commit comments

Comments
 (0)