From 9193a399ee68ddc46ba3b2b8ab9c6f30867079f7 Mon Sep 17 00:00:00 2001 From: Alexandros Pappas Date: Tue, 15 Apr 2025 14:36:06 +0200 Subject: [PATCH] Docs: Update OpenAI chat model definitions Sync the ChatModel enum with the latest OpenAI documentation: - Reorganize models into categories. - Add new models (o4-mini, o3, o1-pro, gpt-4.1 series, realtime previews). - Update descriptions, context windows, max tokens, and knowledge cutoffs. - Add links to official model documentation. - Remove deprecated/preview models. Signed-off-by: Alexandros Pappas --- .../ai/openai/api/OpenAiApi.java | 280 ++++++++++++------ 1 file changed, 197 insertions(+), 83 deletions(-) diff --git a/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/api/OpenAiApi.java b/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/api/OpenAiApi.java index 433f59d9697..3b1f5404688 100644 --- a/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/api/OpenAiApi.java +++ b/models/spring-ai-openai/src/main/java/org/springframework/ai/openai/api/OpenAiApi.java @@ -288,101 +288,189 @@ public ResponseEntity> embeddings(EmbeddingRequest< * information about the model's context window, maximum output tokens, and knowledge * cutoff date. *

- * References: - *

+ * References: OpenAI Models + * Documentation */ public enum ChatModel implements ChatModelDescription { + // --- Reasoning Models --- + /** - * o1 is trained with reinforcement learning to perform complex reasoning. - * It thinks before it answers, producing a long internal chain of thought before - * responding to the user. + * o4-mini is the latest small o-series model. It's optimized for fast, + * effective reasoning with exceptionally efficient performance in coding and + * visual tasks. *

- * The latest o1 model supports both text and image inputs, and produces text - * outputs (including Structured Outputs). + * Context window: 200,000 tokens. Max output tokens: 100,000 tokens. Knowledge + * cutoff: June 1, 2024. *

- * The knowledge cutoff for o1 is October, 2023. + * Model ID: o4-mini *

+ * See: o4-mini */ - O1("o1"), + O4_MINI("o4-mini"), + /** - * o1-preview is trained with reinforcement learning to perform complex - * reasoning. It thinks before it answers, producing a long internal chain of - * thought before responding to the user. + * o3 is a well-rounded and powerful model across domains. It sets a new + * standard for math, science, coding, and visual reasoning tasks. It also excels + * at technical writing and instruction-following. Use it to think through + * multi-step problems that involve analysis across text, code, and images. *

- * The latest o1-preview model supports both text and image inputs, and produces - * text outputs (including Structured Outputs). + * Context window: 200,000 tokens. Max output tokens: 100,000 tokens. Knowledge + * cutoff: June 1, 2024. *

- * The knowledge cutoff for o1-preview is October, 2023. + * Model ID: o3 *

+ * See: o3 */ - O1_PREVIEW("o1-preview"), + O3("o3"), + + /** + * o3-mini is a small reasoning model, providing high intelligence at cost + * and latency targets similar to o1-mini. o3-mini supports key developer + * features, like Structured Outputs, function calling, Batch API. + *

+ * The knowledge cutoff for o3-mini models is October, 2023. + *

+ * Context window: 200,000 tokens. Max output tokens: 100,000 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: o3-mini + *

+ * See: o3-mini + */ + O3_MINI("o3-mini"), + + /** + * The o1 series of models are trained with reinforcement learning to + * perform complex reasoning. o1 models think before they answer, producing a long + * internal chain of thought before responding to the user. + *

+ * Context window: 200,000 tokens. Max output tokens: 100,000 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: o1 + *

+ * See: o1 + */ + O1("o1"), /** * o1-mini is a faster and more affordable reasoning model compared to o1. * o1-mini currently only supports text inputs and outputs. *

- * The knowledge cutoff for o1-mini is October, 2023. + * Context window: 128,000 tokens. Max output tokens: 65,536 tokens. Knowledge + * cutoff: October 1, 2023. *

+ * Model ID: o1-mini + *

+ * See: o1-mini */ O1_MINI("o1-mini"), + /** - * o3-mini is our most recent small reasoning model, providing high - * intelligence at the same cost and latency targets of o1-mini. o3-mini also - * supports key developer features, like Structured Outputs, function calling, - * Batch API, and more. Like other models in the o-series, it is designed to excel - * at science, math, and coding tasks. + * The o1-pro model, part of the o1 series trained with reinforcement + * learning for complex reasoning, uses more compute to think harder and provide + * consistently better answers. *

- * The knowledge cutoff for o3-mini models is October, 2023. + * Note: o1-pro is available in the Responses API only to enable support for + * multi-turn model interactions and other advanced API features. + *

+ * Context window: 200,000 tokens. Max output tokens: 100,000 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: o1-pro *

+ * See: o1-pro */ - O3_MINI("o3-mini"), + O1_PRO("o1-pro"), + + // --- Flagship Models --- /** - * GPT-4o ("omni") is our versatile, high-intelligence flagship model. It - * accepts both text and image inputs and produces text outputs (including - * Structured Outputs). + * GPT-4.1 is the flagship model for complex tasks. It is well suited for + * problem solving across domains. + *

+ * Context window: 1,047,576 tokens. Max output tokens: 32,768 tokens. Knowledge + * cutoff: June 1, 2024. *

- * The knowledge cutoff for GPT-4o models is October, 2023. + * Model ID: gpt-4.1 *

+ * See: gpt-4.1 + */ + GPT_4_1("gpt-4.1"), + + /** + * GPT-4o (“o” for “omni”) is the versatile, high-intelligence flagship + * model. It accepts both text and image inputs, and produces text outputs + * (including Structured Outputs). It is considered the best model for most tasks, + * and the most capable model outside of the o-series models. + *

+ * Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: gpt-4o + *

+ * See: gpt-4o */ GPT_4_O("gpt-4o"), + /** * The chatgpt-4o-latest model ID continuously points to the version of * GPT-4o used in ChatGPT. It is updated frequently when there are significant * changes to ChatGPT's GPT-4o model. *

* Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge - * cutoff: October, 2023. + * cutoff: October 1, 2023. + *

+ * Model ID: chatgpt-4o-latest + *

+ * See: chatgpt-4o-latest */ CHATGPT_4_O_LATEST("chatgpt-4o-latest"), /** - * GPT-4o Audio is a preview release model that accepts audio inputs and - * outputs and can be used in the Chat Completions REST API. + * GPT-4o Audio Preview represents a preview release of models that accept + * audio inputs and outputs via the Chat Completions REST API. + *

+ * Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge + * cutoff: October 1, 2023. *

- * The knowledge cutoff for GPT-4o Audio models is October, 2023. + * Model ID: gpt-4o-audio-preview *

+ * See: gpt-4o-audio-preview */ GPT_4_O_AUDIO_PREVIEW("gpt-4o-audio-preview"), + // --- Cost-Optimized Models --- + /** - * GPT-4o-mini Audio is a preview release model that accepts audio inputs - * and outputs and can be used in the Chat Completions REST API. + * GPT-4.1-mini provides a balance between intelligence, speed, and cost + * that makes it an attractive model for many use cases. + *

+ * Context window: 1,047,576 tokens. Max output tokens: 32,768 tokens. Knowledge + * cutoff: June 1, 2024. *

- * The knowledge cutoff for GPT-4o-mini Audio models is October, 2023. + * Model ID: gpt-4.1-mini *

+ * See: + * gpt-4.1-mini */ - GPT_4_O_MINI_AUDIO_PREVIEW("gpt-4o-mini-audio-preview"), + GPT_4_1_MINI("gpt-4.1-mini"), + + /** + * GPT-4.1-nano is the fastest, most cost-effective GPT-4.1 model. + *

+ * Context window: 1,047,576 tokens. Max output tokens: 32,768 tokens. Knowledge + * cutoff: June 1, 2024. + *

+ * Model ID: gpt-4.1-nano + *

+ * See: + * gpt-4.1-nano + */ + GPT_4_1_NANO("gpt-4.1-nano"), /** * GPT-4o-mini is a fast, affordable small model for focused tasks. It @@ -391,80 +479,106 @@ public enum ChatModel implements ChatModelDescription { * larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar * results at lower cost and latency. *

- * The knowledge cutoff for GPT-4o-mini models is October, 2023. + * Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: gpt-4o-mini *

+ * See: + * gpt-4o-mini */ GPT_4_O_MINI("gpt-4o-mini"), /** - * GPT-4 Turbo is a high-intelligence GPT model with vision capabilities, - * usable in Chat Completions. Vision requests can now use JSON mode and function - * calling. + * GPT-4o-mini Audio Preview is a preview release model that accepts audio + * inputs and outputs and can be used in the Chat Completions REST API. *

- * The knowledge cutoff for the latest GPT-4 Turbo version is December, 2023. + * Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: gpt-4o-mini-audio-preview *

+ * See: gpt-4o-mini-audio-preview */ - GPT_4_TURBO("gpt-4-turbo"), + GPT_4_O_MINI_AUDIO_PREVIEW("gpt-4o-mini-audio-preview"), + + // --- Realtime Models --- /** - * GPT-4-0125-preview is the latest GPT-4 model intended to reduce cases of - * “laziness” where the model doesn’t complete a task. + * GPT-4o Realtime model, is capable of responding to audio and text inputs + * in realtime over WebRTC or a WebSocket interface. + *

+ * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: gpt-4o-realtime-preview *

- * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. + * See: gpt-4o-realtime-preview */ - GPT_4_0125_PREVIEW("gpt-4-0125-preview"), + GPT_4O_REALTIME_PREVIEW("gpt-4o-realtime-preview"), /** - * Currently points to {@link #GPT_4_0125_PREVIEW}. + * GPT-4o-mini Realtime model, is capable of responding to audio and text + * inputs in realtime over WebRTC or a WebSocket interface. + *

+ * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. Knowledge + * cutoff: October 1, 2023. + *

+ * Model ID: gpt-4o-mini-realtime-preview *

- * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. + * See: gpt-4o-mini-realtime-preview */ - GPT_4_1106_PREVIEW("gpt-4-1106-preview"), + GPT_4O_MINI_REALTIME_PREVIEW("gpt-4o-mini-realtime-preview\n"), + + // --- Older GPT Models --- /** - * GPT-4 Turbo Preview is a high-intelligence GPT model usable in Chat - * Completions. + * GPT-4 Turbo is the next generation of GPT-4, an older high-intelligence + * GPT model. It was designed to be a cheaper, better version of GPT-4. Today, we + * recommend using a newer model like GPT-4o. + *

+ * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. Knowledge + * cutoff: Dec 01, 2023. *

- * Currently points to {@link #GPT_4_0125_PREVIEW}. + * Model ID: gpt-4-turbo *

- * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. + * See: + * gpt-4-turbo */ - GPT_4_TURBO_PREVIEW("gpt-4-turbo-preview"), + GPT_4_TURBO("gpt-4-turbo"), /** * GPT-4 is an older version of a high-intelligence GPT model, usable in - * Chat Completions. - *

- * Currently points to {@link #GPT_4_0613}. + * Chat Completions. Vision capabilities may not be available. *

- * Context window: 8,192 tokens. Max output tokens: 8,192 tokens. - */ - GPT_4("gpt-4"), - /** - * GPT-4 model snapshot. + * Context window: 128,000 tokens. Max output tokens: 4,096 tokens. Knowledge + * cutoff: Dec 01, 2023. *

- * Context window: 8,192 tokens. Max output tokens: 8,192 tokens. - */ - GPT_4_0613("gpt-4-0613"), - /** - * GPT-4 model snapshot. + * Model ID: gpt-4 *

- * Context window: 8,192 tokens. Max output tokens: 8,192 tokens. + * See: gpt-4 */ - GPT_4_0314("gpt-4-0314"), + GPT_4("gpt-4"), /** * GPT-3.5 Turbo models can understand and generate natural language or * code and have been optimized for chat using the Chat Completions API but work - * well for non-chat tasks as well. - *

- * As of July 2024, {@link #GPT_4_O_MINI} should be used in place of - * gpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast. - * gpt-3.5-turbo is still available for use in the API. + * well for non-chat tasks as well. Generally lower cost but less capable than + * GPT-4 models. *

+ * As of July 2024, GPT-4o mini is recommended over gpt-3.5-turbo for most use + * cases. *

* Context window: 16,385 tokens. Max output tokens: 4,096 tokens. Knowledge * cutoff: September, 2021. + *

+ * Model ID: gpt-3.5-turbo + *

+ * See: gpt-3.5-turbo */ GPT_3_5_TURBO("gpt-3.5-turbo"),