Skip to content

Commit f91ccf0

Browse files
ThomasVitaletzolov
authored andcommitted
Ollama: Update APIs, Testcontainers, Documentation
Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
1 parent 70c8c5a commit f91ccf0

File tree

13 files changed

+75
-84
lines changed

13 files changed

+75
-84
lines changed

models/spring-ai-ollama/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,5 +73,11 @@
7373
<artifactId>junit-jupiter</artifactId>
7474
<scope>test</scope>
7575
</dependency>
76+
77+
<dependency>
78+
<groupId>org.testcontainers</groupId>
79+
<artifactId>ollama</artifactId>
80+
<scope>test</scope>
81+
</dependency>
7682
</dependencies>
7783
</project>

models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaApi.java

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,8 @@ public OllamaApi(String baseUrl, RestClient.Builder restClientBuilder) {
139139
* context will be returned. You may choose to use the raw parameter if you are
140140
* specifying a full templated prompt in your request to the API, and are managing
141141
* history yourself.
142+
* @param images (optional) a list of base64-encoded images (for multimodal models such as llava).
143+
* @param keepAlive (optional) controls how long the model will stay loaded into memory following the request (default: 5m).
142144
*/
143145
@JsonInclude(Include.NON_NULL)
144146
public record GenerateRequest(
@@ -503,9 +505,9 @@ public ChatRequest build() {
503505
* @param evalCount number of tokens in the response.
504506
* @param evalDuration time spent generating the response.
505507
* @see <a href=
506-
* "https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion">Chat
508+
* "https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion">Chat
507509
* Completion API</a>
508-
* @see <a href="https://github.com/jmorganca/ollama/blob/main/api/types.go">Ollama
510+
* @see <a href="https://github.com/ollama/ollama/blob/main/api/types.go">Ollama
509511
* Types</a>
510512
*/
511513
@JsonInclude(Include.NON_NULL)
@@ -573,6 +575,7 @@ public Flux<ChatResponse> streamingChat(ChatRequest chatRequest) {
573575
*
574576
* @param model The name of model to generate embeddings from.
575577
* @param prompt The text to generate embeddings for.
578+
* @param keepAlive Controls how long the model will stay loaded into memory following the request (default: 5m).
576579
* @param options Additional model parameters listed in the documentation for the
577580
* Modelfile such as temperature.
578581
*/

models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaModel.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ public enum OllamaModel {
2828
*/
2929
LLAMA2("llama2"),
3030

31+
/**
32+
* Llama 3 is a collection of language models ranging from 8B and 70B parameters.
33+
*/
34+
LLAMA3("llama3"),
35+
3136
/**
3237
* The 7B parameters model
3338
*/
@@ -43,6 +48,11 @@ public enum OllamaModel {
4348
*/
4449
PHI("phi"),
4550

51+
/**
52+
* The Phi-3 3.8B language model
53+
*/
54+
PHI3("phi3"),
55+
4656
/**
4757
* A fine-tuned Mistral model
4858
*/

models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaOptions.java

Lines changed: 14 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,9 @@
3535
* @author Christian Tzolov
3636
* @since 0.8.0
3737
* @see <a href=
38-
* "https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values">Ollama
38+
* "https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values">Ollama
3939
* Valid Parameters and Values</a>
40-
* @see <a href="https://github.com/jmorganca/ollama/blob/main/api/types.go">Ollama
41-
* Types</a>
40+
* @see <a href="https://github.com/ollama/ollama/blob/main/api/types.go">Ollama Types</a>
4241
*/
4342
@JsonInclude(Include.NON_NULL)
4443
public class OllamaOptions implements ChatOptions, EmbeddingOptions {
@@ -47,6 +46,8 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
4746

4847
private static final List<String> NON_SUPPORTED_FIELDS = List.of("model", "format", "keep_alive");
4948

49+
// Following fields are ptions which must be set when the model is loaded into memory.
50+
5051
// @formatter:off
5152
/**
5253
* useNUMA Whether to use NUMA.
@@ -110,16 +111,6 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
110111
*/
111112
@JsonProperty("use_mlock") private Boolean useMLock;
112113

113-
/**
114-
* ???
115-
*/
116-
@JsonProperty("rope_frequency_base") private Float ropeFrequencyBase;
117-
118-
/**
119-
* ???
120-
*/
121-
@JsonProperty("rope_frequency_scale") private Float ropeFrequencyScale;
122-
123114
/**
124115
* Sets the number of threads to use during computation. By default,
125116
* Ollama will detect this for optimal performance. It is recommended to set this
@@ -128,6 +119,8 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
128119
*/
129120
@JsonProperty("num_thread") private Integer numThread;
130121

122+
// Following fields are predict options used at runtime.
123+
131124
/**
132125
* ???
133126
*/
@@ -156,8 +149,8 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
156149
/**
157150
* Works together with top-k. A higher value (e.g., 0.95) will lead to
158151
* more diverse text, while a lower value (e.g., 0.5) will generate more focused and
159-
* conservative text. (Default: 0.9)
160-
*/
152+
* conservative text. (Default: 0.9)
153+
*/
161154
@JsonProperty("top_p") private Float topP;
162155

163156
/**
@@ -208,16 +201,15 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
208201
@JsonProperty("mirostat") private Integer mirostat;
209202

210203
/**
211-
* Influences how quickly the algorithm responds to feedback from
212-
* the generated text. A lower learning rate will result in slower adjustments, while
213-
* a higher learning rate will make the algorithm more responsive. (Default: 0.1).
204+
* Controls the balance between coherence and diversity of the output.
205+
* A lower value will result in more focused and coherent text. (Default: 5.0)
214206
*/
215207
@JsonProperty("mirostat_tau") private Float mirostatTau;
216208

217209
/**
218-
* Controls the balance between coherence and diversity of the
219-
* output. A lower value will result in more focused and coherent text. (Default:
220-
* 5.0).
210+
* Influences how quickly the algorithm responds to feedback from the generated text.
211+
* A lower learning rate will result in slower adjustments, while a higher learning rate
212+
* will make the algorithm more responsive. (Default: 0.1)
221213
*/
222214
@JsonProperty("mirostat_eta") private Float mirostatEta;
223215

@@ -235,6 +227,7 @@ public class OllamaOptions implements ChatOptions, EmbeddingOptions {
235227

236228

237229
// Following fields are not part of the Ollama Options API but part of the Request.
230+
238231
/**
239232
* NOTE: Synthetic field not part of the official Ollama API.
240233
* Used to allow overriding the model name with prompt options.
@@ -341,16 +334,6 @@ public OllamaOptions withUseMLock(Boolean useMLock) {
341334
return this;
342335
}
343336

344-
public OllamaOptions withRopeFrequencyBase(Float ropeFrequencyBase) {
345-
this.ropeFrequencyBase = ropeFrequencyBase;
346-
return this;
347-
}
348-
349-
public OllamaOptions withRopeFrequencyScale(Float ropeFrequencyScale) {
350-
this.ropeFrequencyScale = ropeFrequencyScale;
351-
return this;
352-
}
353-
354337
public OllamaOptions withNumThread(Integer numThread) {
355338
this.numThread = numThread;
356339
return this;
@@ -553,22 +536,6 @@ public void setUseMLock(Boolean useMLock) {
553536
this.useMLock = useMLock;
554537
}
555538

556-
public Float getRopeFrequencyBase() {
557-
return this.ropeFrequencyBase;
558-
}
559-
560-
public void setRopeFrequencyBase(Float ropeFrequencyBase) {
561-
this.ropeFrequencyBase = ropeFrequencyBase;
562-
}
563-
564-
public Float getRopeFrequencyScale() {
565-
return this.ropeFrequencyScale;
566-
}
567-
568-
public void setRopeFrequencyScale(Float ropeFrequencyScale) {
569-
this.ropeFrequencyScale = ropeFrequencyScale;
570-
}
571-
572539
public Integer getNumThread() {
573540
return this.numThread;
574541
}

models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/OllamaChatClientIT.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@
2929
import org.springframework.ai.chat.metadata.Usage;
3030
import org.springframework.ai.chat.prompt.ChatOptionsBuilder;
3131
import org.springframework.ai.chat.messages.AssistantMessage;
32-
import org.testcontainers.containers.GenericContainer;
3332
import org.testcontainers.junit.jupiter.Container;
3433
import org.testcontainers.junit.jupiter.Testcontainers;
3534

@@ -50,6 +49,7 @@
5049
import org.springframework.boot.test.context.SpringBootTest;
5150
import org.springframework.context.annotation.Bean;
5251
import org.springframework.core.convert.support.DefaultConversionService;
52+
import org.testcontainers.ollama.OllamaContainer;
5353

5454
import static org.assertj.core.api.Assertions.assertThat;
5555

@@ -63,7 +63,7 @@ class OllamaChatClientIT {
6363
private static final Log logger = LogFactory.getLog(OllamaChatClientIT.class);
6464

6565
@Container
66-
static GenericContainer<?> ollamaContainer = new GenericContainer<>("ollama/ollama:0.1.29").withExposedPorts(11434);
66+
static OllamaContainer ollamaContainer = new OllamaContainer("ollama/ollama:0.1.32");
6767

6868
static String baseUrl;
6969

models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/OllamaChatClientMultimodalIT.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
import org.junit.jupiter.api.BeforeAll;
2424
import org.junit.jupiter.api.Disabled;
2525
import org.junit.jupiter.api.Test;
26-
import org.testcontainers.containers.GenericContainer;
2726
import org.testcontainers.junit.jupiter.Container;
2827
import org.testcontainers.junit.jupiter.Testcontainers;
2928

@@ -39,6 +38,7 @@
3938
import org.springframework.context.annotation.Bean;
4039
import org.springframework.core.io.ClassPathResource;
4140
import org.springframework.util.MimeTypeUtils;
41+
import org.testcontainers.ollama.OllamaContainer;
4242

4343
import static org.assertj.core.api.Assertions.assertThat;
4444

@@ -52,7 +52,7 @@ class OllamaChatClientMultimodalIT {
5252
private static final Log logger = LogFactory.getLog(OllamaChatClientIT.class);
5353

5454
@Container
55-
static GenericContainer<?> ollamaContainer = new GenericContainer<>("ollama/ollama:0.1.29").withExposedPorts(11434);
55+
static OllamaContainer ollamaContainer = new OllamaContainer("ollama/ollama:0.1.32");
5656

5757
static String baseUrl;
5858

models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/OllamaEmbeddingClientIT.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
import org.springframework.boot.SpringBootConfiguration;
3535
import org.springframework.boot.test.context.SpringBootTest;
3636
import org.springframework.context.annotation.Bean;
37+
import org.testcontainers.ollama.OllamaContainer;
3738

3839
import static org.assertj.core.api.Assertions.assertThat;
3940

@@ -45,7 +46,7 @@ class OllamaEmbeddingClientIT {
4546
private static final Log logger = LogFactory.getLog(OllamaApiIT.class);
4647

4748
@Container
48-
static GenericContainer<?> ollamaContainer = new GenericContainer<>("ollama/ollama:0.1.29").withExposedPorts(11434);
49+
static OllamaContainer ollamaContainer = new OllamaContainer("ollama/ollama:0.1.32");
4950

5051
static String baseUrl;
5152

models/spring-ai-ollama/src/test/java/org/springframework/ai/ollama/api/OllamaApiIT.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
import org.junit.jupiter.api.BeforeAll;
2525
import org.junit.jupiter.api.Disabled;
2626
import org.junit.jupiter.api.Test;
27-
import org.testcontainers.containers.GenericContainer;
2827
import org.testcontainers.junit.jupiter.Container;
2928
import org.testcontainers.junit.jupiter.Testcontainers;
29+
import org.testcontainers.ollama.OllamaContainer;
3030
import reactor.core.publisher.Flux;
3131

3232
import org.springframework.ai.ollama.api.OllamaApi.ChatRequest;
@@ -50,7 +50,7 @@ public class OllamaApiIT {
5050
private static final Log logger = LogFactory.getLog(OllamaApiIT.class);
5151

5252
@Container
53-
static GenericContainer<?> ollamaContainer = new GenericContainer<>("ollama/ollama:0.1.29").withExposedPorts(11434);
53+
static OllamaContainer ollamaContainer = new OllamaContainer("ollama/ollama:0.1.32");
5454

5555
static OllamaApi ollamaApi;
5656

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/ollama-chat.adoc

Lines changed: 11 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ Spring AI supports the Ollama text generation with `OllamaChatClient`.
66
== Prerequisites
77

88
You first need to run Ollama on your local machine.
9-
Refer to the official Ollama project link:https://github.com/jmorganca/ollama[README] to get started running models on your local machine.
9+
Refer to the official Ollama project link:https://github.com/ollama/ollama[README] to get started running models on your local machine.
1010

11-
NOTE: installing `ollama run llama2` will download a 4GB docker image.
11+
NOTE: installing `ollama run llama3` will download a 4.7GB model artifact.
1212

1313
=== Add Repositories and BOM
1414

@@ -53,7 +53,7 @@ The prefix `spring.ai.ollama` is the property prefix to configure the connection
5353
|====
5454

5555
The prefix `spring.ai.ollama.chat.options` is the property prefix that configures the Ollama chat client .
56-
It includes the Ollama request (advanced) parameters such as the `model`, `keep-alive`, `format` and `template` as well as the Ollama model `options` properties.
56+
It includes the Ollama request (advanced) parameters such as the `model`, `keep-alive`, and `format` as well as the Ollama model `options` properties.
5757

5858
Here are the advanced request parameter for the Ollama chat client:
5959

@@ -62,12 +62,12 @@ Here are the advanced request parameter for the Ollama chat client:
6262
| Property | Description | Default
6363

6464
| spring.ai.ollama.chat.enabled | Enable Ollama chat client. | true
65-
| spring.ai.ollama.chat.options.model | The name of the https://github.com/ollama/ollama?tab=readme-ov-file#model-library[supported models] to use. | mistral
65+
| spring.ai.ollama.chat.options.model | The name of the https://github.com/ollama/ollama?tab=readme-ov-file#model-library[supported model] to use. | mistral
6666
| spring.ai.ollama.chat.options.format | The format to return a response in. Currently the only accepted value is `json` | -
6767
| spring.ai.ollama.chat.options.keep_alive | Controls how long the model will stay loaded into memory following the request | 5m
6868
|====
6969

70-
The `options` properties are based on the link:https://github.com/jmorganca/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values[Ollama Valid Parameters and Values] and link:https://github.com/jmorganca/ollama/blob/main/api/types.go[Ollama Types]. The default values are based on: link:https://github.com/ollama/ollama/blob/b538dc3858014f94b099730a592751a5454cab0a/api/types.go#L364[Ollama type defaults].
70+
The remaining `options` properties are based on the link:https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values[Ollama Valid Parameters and Values] and link:https://github.com/ollama/ollama/blob/main/api/types.go[Ollama Types]. The default values are based on: link:https://github.com/ollama/ollama/blob/b538dc3858014f94b099730a592751a5454cab0a/api/types.go#L364[Ollama type defaults].
7171

7272
[cols="3,6,1"]
7373
|====
@@ -84,8 +84,6 @@ The `options` properties are based on the link:https://github.com/jmorganca/olla
8484
| spring.ai.ollama.chat.options.vocab-only | ??? | -
8585
| spring.ai.ollama.chat.options.use-mmap | ??? | true
8686
| spring.ai.ollama.chat.options.use-mlock | ??? | false
87-
| spring.ai.ollama.chat.options.rope-frequency-base | ??? | 10000.0
88-
| spring.ai.ollama.chat.options.rope-frequency-scale | ??? | 1.0
8987
| spring.ai.ollama.chat.options.num-thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). 0 = let the runtime decide | 0
9088
| spring.ai.ollama.chat.options.num-keep | ??? | 0
9189
| spring.ai.ollama.chat.options.seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. | -1
@@ -100,14 +98,12 @@ The `options` properties are based on the link:https://github.com/jmorganca/olla
10098
| spring.ai.ollama.chat.options.presence-penalty | ??? | 0.0
10199
| spring.ai.ollama.chat.options.frequency-penalty | ??? | 0.0
102100
| spring.ai.ollama.chat.options.mirostat | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0) | 0
103-
| spring.ai.ollama.chat.options.mirostat-tau | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. | 5.0
104-
| spring.ai.ollama.chat.options.mirostat-eta | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. | 0.1
101+
| spring.ai.ollama.chat.options.mirostat-tau | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. | 5.0
102+
| spring.ai.ollama.chat.options.mirostat-eta | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. | 0.1
105103
| spring.ai.ollama.chat.options.penalize-newline | ??? | true
106104
| spring.ai.ollama.chat.options.stop | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile. | -
107105
|====
108106

109-
NOTE: The list of options for chat is to be reviewed. This https://github.com/spring-projects/spring-ai/issues/230[issue] will track progress.
110-
111107
TIP: All properties prefixed with `spring.ai.ollama.chat.options` can be overridden at runtime by adding a request specific <<chat-options>> to the `Prompt` call.
112108

113109
== Runtime Options [[chat-options]]
@@ -270,13 +266,13 @@ The `OllamaOptions` provides the configuration information for all chat requests
270266

271267
== Low-level OllamaApi Client [[low-level-api]]
272268

273-
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaApi.java[OllamaApi] provides is lightweight Java client for Ollama Chat API link:https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion[Ollama Chat Completion API].
269+
The link:https://github.com/spring-projects/spring-ai/blob/main/models/spring-ai-ollama/src/main/java/org/springframework/ai/ollama/api/OllamaApi.java[OllamaApi] provides a lightweight Java client for the Ollama Chat Completion API link:https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion[Ollama Chat Completion API].
274270

275-
Following class diagram illustrates the `OllamaApi` chat interfaces and building blocks:
271+
The following class diagram illustrates the `OllamaApi` chat interfaces and building blocks:
276272

277273
image::ollama-chat-completion-api.jpg[OllamaApi Chat Completion API Diagram, 800, 600]
278274

279-
Here is a simple snippet how to use the api programmatically:
275+
Here is a simple snippet showing how to use the API programmatically:
280276

281277
[source,java]
282278
----
@@ -288,7 +284,7 @@ var request = ChatRequest.builder("orca-mini")
288284
.withStream(false) // not streaming
289285
.withMessages(List.of(
290286
Message.builder(Role.SYSTEM)
291-
.withContent("You are geography teacher. You are talking to a student.")
287+
.withContent("You are a geography teacher. You are talking to a student.")
292288
.build(),
293289
Message.builder(Role.USER)
294290
.withContent("What is the capital of Bulgaria and what is the size? "

0 commit comments

Comments
 (0)