Skip to content

Commit 45563d4

Browse files
authored
more streamlined APIs for typical use cases (#330)
* more streamlined APIs for typical use cases - inspired by https://developer.apple.com/documentation/foundationmodels
1 parent 0d09689 commit 45563d4

File tree

19 files changed

+1079
-21
lines changed

19 files changed

+1079
-21
lines changed

Libraries/MLXLLM/Documentation.docc/Documentation.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,24 @@ Example implementations of various Large Language Models (LLMs).
1111
- [MLXVLM](MLXVLM)
1212
- [StableDiffusion](StableDiffusion)
1313

14+
## Quick Start
15+
16+
See <doc:evaluation>.
17+
18+
Using LLMs and VLMs is as easy as this:
19+
20+
```swift
21+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
22+
let session = ChatSession(model)
23+
print(try await session.respond(to: "What are two things to see in San Francisco?")
24+
print(try await session.respond(to: "How about a great place to eat?")
25+
```
26+
27+
More advanced APIs are available for those that need them, see <doc:using-model>.
28+
1429
## Topics
1530

31+
- <doc:evaluation>
1632
- <doc:adding-model>
1733
- <doc:using-model>
1834

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Evaluation
2+
3+
The simplified LLM/VLM API allows you to load a model and evaluate prompts with only a few lines of code.
4+
5+
For example, this loads a model and asks a question and a follow-on question:
6+
7+
```swift
8+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
9+
let session = ChatSession(model)
10+
print(try await session.respond(to: "What are two things to see in San Francisco?")
11+
print(try await session.respond(to: "How about a great place to eat?")
12+
```
13+
14+
The second question actually refers to information (the location) from the first
15+
question -- this context is maintained inside the ``ChatSession`` object.
16+
17+
If you need a one-shot prompt/response simply create a ``ChatSession``, evaluate
18+
the prompt and discard. Multiple ``ChatSession`` instances could also be used
19+
(at the cost of the memory in the `KVCache`) to handle multiple streams of
20+
context.
21+
22+
## Streaming Output
23+
24+
The previous example produced the entire response in one call. Often
25+
users want to see the text as it is generated -- you can do this with
26+
a stream:
27+
28+
```swift
29+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
30+
let session = ChatSession(model)
31+
32+
for try await item in session.streamResponse(to: "Why is the sky blue?") {
33+
print(item, terminator: "")
34+
}
35+
print()
36+
```
37+
38+
## VLMs (Vision Language Models)
39+
40+
This same API supports VLMs as well. Simply present the image or video
41+
to the ``ChatSession``:
42+
43+
```swift
44+
let model = try await loadModel(id: "mlx-community/Qwen2.5-VL-3B-Instruct-4bit")
45+
let session = ChatSession(model)
46+
47+
let answer1 = try await session.respond(
48+
to: "what kind of creature is in the picture?"
49+
image: .url(URL(fileURLWithPath: "support/test.jpg"))
50+
)
51+
print(answer1)
52+
53+
// we can ask a followup question referring back to the previous image
54+
let answer2 = try await session.respond(
55+
to: "What is behind the dog?"
56+
)
57+
print(answer2)
58+
```
59+
60+
## Advanced Usage
61+
62+
The ``ChatSession`` has a number of parameters you can supply when creating it:
63+
64+
- **instructions**: optional instructions to the chat session, e.g. describing what type of responses to give
65+
- for example you might instruct the language model to respond in rhyme or
66+
talking like a famous character from a movie
67+
- or that the responses should be very brief
68+
- **generateParameters**: parameters that control the generation of output, e.g. token limits and temperature
69+
- see ``GenerateParameters``
70+
- **processing**: optional media processing instructions

Libraries/MLXLLM/Documentation.docc/using-model.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
Using a model is easy: load the weights, tokenize and evaluate.
44

5+
There is a high level API described in <doc:evaluation> and this documentation
6+
describes the lower level API if you need more control.
7+
58
## Loading a Model
69

710
A model is typically loaded by using a `ModelFactory` and a `ModelConfiguration`:

Libraries/MLXLLM/LLMModelFactory.swift

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,7 @@ public class LLMModelFactory: ModelFactory {
312312
public func _load(
313313
hub: HubApi, configuration: ModelConfiguration,
314314
progressHandler: @Sendable @escaping (Progress) -> Void
315-
) async throws -> ModelContext {
315+
) async throws -> sending ModelContext {
316316
// download weights and config
317317
let modelDirectory = try await downloadModel(
318318
hub: hub, configuration: configuration, progressHandler: progressHandler)
@@ -361,3 +361,9 @@ public class LLMModelFactory: ModelFactory {
361361
}
362362

363363
}
364+
365+
public class TrampolineModelFactory: NSObject, ModelFactoryTrampoline {
366+
public static func modelFactory() -> (any MLXLMCommon.ModelFactory)? {
367+
LLMModelFactory.shared
368+
}
369+
}

Libraries/MLXLLM/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,22 @@ Currently supported model types are:
6161

6262
See [llm-tool](../../Tools/llm-tool)
6363

64+
# Quick Start
65+
66+
Using LLMs and VLMs from MLXLMCommon is as easy as:
67+
68+
```swift
69+
let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
70+
let session = ChatSession(model)
71+
print(try await session.respond(to: "What are two things to see in San Francisco?")
72+
print(try await session.respond(to: "How about a great place to eat?")
73+
```
74+
75+
For more information see
76+
[Evaluation](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/evaluation)
77+
or [Using Models](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/using-model)
78+
for more advanced API.
79+
6480
# Adding a Model
6581

6682
If the model follows the typical LLM pattern:

0 commit comments

Comments
 (0)