Skip to content

Commit db4f0b0

Browse files
committed
Add GPT-4o ITs and documentation updates
1 parent b3cfa2b commit db4f0b0

File tree

3 files changed

+22
-15
lines changed

3 files changed

+22
-15
lines changed

models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/OpenAiChatClientIT.java

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424

2525
import org.junit.jupiter.api.Test;
2626
import org.junit.jupiter.api.condition.EnabledIfEnvironmentVariable;
27+
import org.junit.jupiter.params.ParameterizedTest;
28+
import org.junit.jupiter.params.provider.ValueSource;
2729
import org.slf4j.Logger;
2830
import org.slf4j.LoggerFactory;
2931
import reactor.core.publisher.Flux;
@@ -243,33 +245,37 @@ void streamFunctionCallTest() {
243245
assertThat(content).containsAnyOf("15.0", "15");
244246
}
245247

246-
@Test
247-
void multiModalityEmbeddedImage() throws IOException {
248+
@ParameterizedTest(name = "{0} : {displayName} ")
249+
@ValueSource(strings = { "gpt-4-vision-preview", "gpt-4o" })
250+
void multiModalityEmbeddedImage(String modelName) throws IOException {
248251

249252
byte[] imageData = new ClassPathResource("/test.png").getContentAsByteArray();
250253

251254
var userMessage = new UserMessage("Explain what do you see on this picture?",
252255
List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageData)));
253256

254-
ChatResponse response = chatClient.call(new Prompt(List.of(userMessage),
255-
OpenAiChatOptions.builder().withModel(OpenAiApi.ChatModel.GPT_4_VISION_PREVIEW.getValue()).build()));
257+
ChatResponse response = chatClient
258+
.call(new Prompt(List.of(userMessage), OpenAiChatOptions.builder().withModel(modelName).build()));
256259

257260
logger.info(response.getResult().getOutput().getContent());
258-
assertThat(response.getResult().getOutput().getContent()).contains("bananas", "apple", "bowl");
261+
assertThat(response.getResult().getOutput().getContent()).contains("bananas", "apple");
262+
assertThat(response.getResult().getOutput().getContent()).containsAnyOf("bowl", "basket");
259263
}
260264

261-
@Test
262-
void multiModalityImageUrl() throws IOException {
265+
@ParameterizedTest(name = "{0} : {displayName} ")
266+
@ValueSource(strings = { "gpt-4-vision-preview", "gpt-4o" })
267+
void multiModalityImageUrl(String modelName) throws IOException {
263268

264269
var userMessage = new UserMessage("Explain what do you see on this picture?",
265270
List.of(new Media(MimeTypeUtils.IMAGE_PNG,
266271
"https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/_images/multimodal.test.png")));
267272

268-
ChatResponse response = chatClient.call(new Prompt(List.of(userMessage),
269-
OpenAiChatOptions.builder().withModel(OpenAiApi.ChatModel.GPT_4_VISION_PREVIEW.getValue()).build()));
273+
ChatResponse response = chatClient
274+
.call(new Prompt(List.of(userMessage), OpenAiChatOptions.builder().withModel(modelName).build()));
270275

271276
logger.info(response.getResult().getOutput().getContent());
272-
assertThat(response.getResult().getOutput().getContent()).contains("bananas", "apple", "bowl");
277+
assertThat(response.getResult().getOutput().getContent()).contains("bananas", "apple");
278+
assertThat(response.getResult().getOutput().getContent()).containsAnyOf("bowl", "basket");
273279
}
274280

275281
@Test

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/openai-chat.adoc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,13 +145,14 @@ Read more about xref:api/chat/functions/openai-chat-functions.adoc[OpenAI Functi
145145
== Multimodal
146146

147147
Multimodality refers to a model's ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats.
148-
Presently, the OpenAI `gpt-4-visual-preview` model offers multimodal support. Refer to the link:https://platform.openai.com/docs/guides/vision[Vision] guide for more information.
148+
Presently, the OpenAI `gpt-4-visual-preview` and `gpt-4o` models offers multimodal support.
149+
Refer to the link:https://platform.openai.com/docs/guides/vision[Vision] guide for more information.
149150

150151
The OpenAI link:https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages[User Message API] can incorporate a list of base64-encoded images or image urls with the message.
151152
Spring AI’s link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/messages/Message.java[Message] interface facilitates multimodal AI models by introducing the link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/chat/messages/Media.java[Media] type.
152153
This type encompasses data and details regarding media attachments in messages, utilizing Spring’s `org.springframework.util.MimeType` and a `java.lang.Object` for the raw media data.
153154

154-
Below is a code example excerpted from link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/OpenAiChatClientIT.java[OpenAiChatClientIT.java], illustrating the fusion of user text with an image.
155+
Below is a code example excerpted from link:https://github.com/spring-projects/spring-ai/blob/b3cfa2b900ea785e055e4ff71086eeb52f6578a3/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/chat/OpenAiChatClientIT.java[OpenAiChatClientIT.java], illustrating the fusion of user text with an image using the the `GPT_4_VISION_PREVIEW` model.
155156

156157
[source,java]
157158
----
@@ -164,7 +165,7 @@ ChatResponse response = chatClient.call(new Prompt(List.of(userMessage),
164165
OpenAiChatOptions.builder().withModel(OpenAiApi.ChatModel.GPT_4_VISION_PREVIEW.getValue()).build()));
165166
----
166167

167-
or the image URL equivalent:
168+
or the image URL equivalent using the `GPT_4_O` model :
168169

169170
[source,java]
170171
----
@@ -173,7 +174,7 @@ var userMessage = new UserMessage("Explain what do you see on this picture?",
173174
"https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/_images/multimodal.test.png")));
174175
175176
ChatResponse response = chatClient.call(new Prompt(List.of(userMessage),
176-
OpenAiChatOptions.builder().withModel(OpenAiApi.ChatModel.GPT_4_VISION_PREVIEW.getValue()).build()));
177+
OpenAiChatOptions.builder().withModel(OpenAiApi.ChatModel.GPT_4_O.getValue()).build()));
177178
----
178179

179180
TIP: you can pass multiple images as well.

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/multimodality.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ and produce a response like:
5757

5858
Latest version of Spring AI provides multimodal support for the following Chat Clients:
5959

60-
* xref:api/chat/openai-chat.adoc#_multimodal[Open AI - (GPT-4-Vision model)]
60+
* xref:api/chat/openai-chat.adoc#_multimodal[Open AI - (GPT-4-Vision and GPT-4o models)]
6161
* xref:api/chat/openai-chat.adoc#_multimodal[Ollama - (LlaVa and Baklava models)]
6262
* xref:api/chat/vertexai-gemini-chat.adoc#_multimodal[Vertex AI Gemini - (gemini-pro-vision model)]
6363
* xref:api/chat/anthropic-chat.adoc#_multimodal[Anthropic Claude 3]

0 commit comments

Comments
 (0)