ml-explore
diff --git a/‎Libraries/MLXLLM/Documentation.docc/Documentation.md
Lines changed: 16 additions & 0 deletions b/‎Libraries/MLXLLM/Documentation.docc/Documentation.md
Lines changed: 16 additions & 0 deletions
diff --git a/‎Libraries/MLXLLM/Documentation.docc/evaluation.md
Lines changed: 70 additions & 0 deletions b/‎Libraries/MLXLLM/Documentation.docc/evaluation.md
Lines changed: 70 additions & 0 deletions
diff --git a/‎Libraries/MLXLLM/Documentation.docc/using-model.md
Lines changed: 3 additions & 0 deletions b/‎Libraries/MLXLLM/Documentation.docc/using-model.md
Lines changed: 3 additions & 0 deletions
diff --git a/‎Libraries/MLXLLM/LLMModelFactory.swift
Lines changed: 7 additions & 1 deletion b/‎Libraries/MLXLLM/LLMModelFactory.swift
Lines changed: 7 additions & 1 deletion
diff --git a/‎Libraries/MLXLLM/README.md
Lines changed: 16 additions & 0 deletions b/‎Libraries/MLXLLM/README.md
Lines changed: 16 additions & 0 deletions
@@ -11,8 +11,24 @@ Example implementations of various Large Language Models (LLMs).
 - [MLXVLM](MLXVLM)
 - [StableDiffusion](StableDiffusion)
 
+## Quick Start
+
+See <doc:evaluation>.
+
+Using LLMs and VLMs is as easy as this:
+
+```swift
+let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
+let session = ChatSession(model)
+print(try await session.respond(to: "What are two things to see in San Francisco?")
+print(try await session.respond(to: "How about a great place to eat?")
+```
+
+More advanced APIs are available for those that need them, see <doc:using-model>.
+
 ## Topics
 
+- <doc:evaluation>
 - <doc:adding-model>
 - <doc:using-model>
 
 
@@ -0,0 +1,70 @@
+#  Evaluation
+
+The simplified LLM/VLM API allows you to load a model and evaluate prompts with only a few lines of code.
+
+For example, this loads a model and asks a question and a follow-on question:
+
+```swift
+let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
+let session = ChatSession(model)
+print(try await session.respond(to: "What are two things to see in San Francisco?")
+print(try await session.respond(to: "How about a great place to eat?")
+```
+
+The second question actually refers to information (the location) from the first
+question -- this context is maintained inside the ``ChatSession`` object.
+
+If you need a one-shot prompt/response simply create a ``ChatSession``, evaluate
+the prompt and discard.  Multiple ``ChatSession`` instances could also be used
+(at the cost of the memory in the `KVCache`) to handle multiple streams of
+context.
+
+## Streaming Output
+
+The previous example produced the entire response in one call.  Often
+users want to see the text as it is generated -- you can do this with
+a stream:
+
+```swift
+let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
+let session = ChatSession(model)
+
+for try await item in session.streamResponse(to: "Why is the sky blue?") {
+    print(item, terminator: "")
+}
+print()
+```
+
+## VLMs (Vision Language Models)
+
+This same API supports VLMs as well.  Simply present the image or video
+to the ``ChatSession``:
+
+```swift
+let model = try await loadModel(id: "mlx-community/Qwen2.5-VL-3B-Instruct-4bit")
+let session = ChatSession(model)
+
+let answer1 = try await session.respond(
+    to: "what kind of creature is in the picture?"
+    image: .url(URL(fileURLWithPath: "support/test.jpg"))
+)
+print(answer1)
+
+// we can ask a followup question referring back to the previous image
+let answer2 = try await session.respond(
+    to: "What is behind the dog?"
+)
+print(answer2)
+```
+
+## Advanced Usage
+
+The ``ChatSession`` has a number of parameters you can supply when creating it:
+
+- **instructions**: optional instructions to the chat session, e.g. describing what type of responses to give
+    - for example you might instruct the language model to respond in rhyme or
+        talking like a famous character from a movie
+    - or that the responses should be very brief
+- **generateParameters**: parameters that control the generation of output, e.g. token limits and temperature
+    - see ``GenerateParameters``
+- **processing**: optional media processing instructions
@@ -2,6 +2,9 @@
 
 Using a model is easy:  load the weights, tokenize and evaluate.
 
+There is a high level API described in <doc:evaluation> and this documentation
+describes the lower level API if you need more control.
+
 ## Loading a Model
 
 A model is typically loaded by using a `ModelFactory` and a `ModelConfiguration`:
 
@@ -312,7 +312,7 @@ public class LLMModelFactory: ModelFactory {
     public func _load(
         hub: HubApi, configuration: ModelConfiguration,
         progressHandler: @Sendable @escaping (Progress) -> Void
-    ) async throws -> ModelContext {
+    ) async throws -> sending ModelContext {
         // download weights and config
         let modelDirectory = try await downloadModel(
             hub: hub, configuration: configuration, progressHandler: progressHandler)
@@ -361,3 +361,9 @@ public class LLMModelFactory: ModelFactory {
     }
 
 }
+
+public class TrampolineModelFactory: NSObject, ModelFactoryTrampoline {
+    public static func modelFactory() -> (any MLXLMCommon.ModelFactory)? {
+        LLMModelFactory.shared
+    }
+}
@@ -61,6 +61,22 @@ Currently supported model types are:
 
 See [llm-tool](../../Tools/llm-tool)
 
+# Quick Start
+
+Using LLMs and VLMs from MLXLMCommon is as easy as:
+
+```swift
+let model = try await loadModel(id: "mlx-community/Qwen3-4B-4bit")
+let session = ChatSession(model)
+print(try await session.respond(to: "What are two things to see in San Francisco?")
+print(try await session.respond(to: "How about a great place to eat?")
+```
+
+For more information see 
+[Evaluation](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/evaluation)
+or [Using Models](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon/using-model)
+for more advanced API.
+
 # Adding a Model
 
 If the model follows the typical LLM pattern: