Update README.md (#180)

davidkoski · web-flow · commit d189ae38b480 · 2025-01-27T15:36:25.000-08:00
* Update README.md

add documentation pointers

* add some initial documentation for MLXLLM

- make sure the navigation works ok in swiftpackageindex
diff --git a/Libraries/MLXLLM/Documentation.docc/Documentation.md b/Libraries/MLXLLM/Documentation.docc/Documentation.md
@@ -0,0 +1,33 @@
+# ``MLXLLM``
+
+Example implementations of various Large Language Models (LLMs).
+
+## Other MLX Libraries Packages
+
+- [MLXEmbedders](MLXEmbedders)
+- [MLXLLM](MLXLLM)
+- [MLXLMCommon](MLXLMCommon)
+- [MLXMNIST](MLXMNIST)
+- [MLXVLM](MLXVLM)
+- [StableDiffusion](StableDiffusion)
+
+## Topics
+
+- <doc:adding-model>
+- <doc:using-model>
+
+### Models
+
+- ``CohereModel``
+- ``GemmaModel``
+- ``Gemma2Model``
+- ``InternLM2Model``
+- ``LlamaModel``
+- ``OpenELMModel``
+- ``PhiModel``
+- ``Phi3Model``
+- ``PhiMoEModel``
+- ``Qwen2Model``
+- ``Starcoder2Model``
+
+
diff --git a/Libraries/MLXLLM/Documentation.docc/adding-model.md b/Libraries/MLXLLM/Documentation.docc/adding-model.md
@@ -0,0 +1,94 @@
+#  Adding a Model
+
+If the model follows the typical LLM pattern you can add a new
+model in a few steps.
+
+- `config.json`, `tokenizer.json`, and `tokenizer_config.json`
+- `*.safetensors`
+
+You can follow the pattern of the models in the [Models](Models) directory
+and create a `.swift` file for your new model:
+
+## Create a Configuration
+
+Create a configuration struct to match the `config.json` (any parameters needed).
+
+```swift
+public struct YourModelConfiguration: Codable, Sendable {
+    public let hiddenSize: Int
+    
+    // use this pattern for values that need defaults
+    public let _layerNormEps: Float?
+    public var layerNormEps: Float { _layerNormEps ?? 1e-6 }
+    
+    enum CodingKeys: String, CodingKey {
+        case hiddenSize = "hidden_size"
+        case _layerNormEps = "layer_norm_eps"
+    }
+}
+```
+
+## Create the Model Class
+
+Create the model class.  The top-level public class should have a
+structure something like this:
+
+```swift
+public class YourModel: Module, LLMModel, KVCacheDimensionProvider, LoRAModel {
+
+    public let kvHeads: [Int]
+
+    @ModuleInfo var model: YourModelInner
+
+    public func loraLinearLayers() -> LoRALinearLayers {
+        // TODO: modify as needed
+        model.layers.map { ($0.attention, ["q_proj", "v_proj"]) }
+    }
+
+    public init(_ args: YourModelConfiguration) {
+        self.kvHeads = Array(repeating: args.kvHeads, count: args.hiddenLayers)
+        self.model = YourModelInner(args)
+    }
+
+    public func callAsFunction(_ inputs: MLXArray, cache: [KVCache]?) -> MLXArray {
+        // TODO: modify as needed
+        let out = model(inputs, cache: cache)
+        return model.embedTokens.asLinear(out)
+    }
+}
+```
+
+## Register the Model
+
+In [LLMModelFactory.swift](LLMModelFactory.swift) register the model type itself
+(this is independent of the model id):
+
+```swift
+public class ModelTypeRegistry: @unchecked Sendable {
+...
+    private var creators: [String: @Sendable (URL) throws -> any LanguageModel] = [
+        "yourModel": create(YourModelConfiguration.self, YourModel.init),
+```
+
+Add a constant for the model in the `ModelRegistry` (not strictly required but useful
+for callers to refer to it in code):
+
+```swift
+public class ModelRegistry: @unchecked Sendable {
+...
+    static public let yourModel_4bit = ModelConfiguration(
+        id: "mlx-community/YourModel-4bit",
+        defaultPrompt: "What is the gravity on Mars and the moon?"
+    )
+```
+
+and finally add it to the all list -- this will let users find the model
+configuration by id:
+
+```swift
+    private static func all() -> [ModelConfiguration] {
+        [
+            codeLlama13b4bit,
+...
+            yourModel_4bit,
+```
diff --git a/Libraries/MLXLLM/Documentation.docc/using-model.md b/Libraries/MLXLLM/Documentation.docc/using-model.md
@@ -0,0 +1,117 @@
+#  Using a Model
+
+Using a model is easy:  load the weights, tokenize and evaluate.
+
+## Loading a Model
+
+A model is typically loaded by using a `ModelFactory` and a `ModelConfiguration`:
+
+```swift
+// e.g. LLMModelFactory.shared
+let modelFactory: ModelFactory
+
+// e.g. MLXLLM.ModelRegistry.llama3_8B_4bit
+let modelConfiguration: ModelConfiguration
+
+let container = try await modelFactory.loadContainer(configuration: modelConfiguration)
+```
+
+The `container` provides an isolation context (an `actor`) to run inference in the model.
+
+Predefined `ModelConfiguration` instances are provided as static variables
+on the `ModelRegistry` types or they can be created:
+
+```swift
+let modelConfiguration = ModelConfiguration(id: "mlx-community/llama3_8B_4bit")
+```
+
+The flow inside the `ModelFactory` goes like this:
+
+```swift
+public class LLMModelFactory: ModelFactory {
+
+    public func _load(
+        hub: HubApi, configuration: ModelConfiguration,
+        progressHandler: @Sendable @escaping (Progress) -> Void
+    ) async throws -> ModelContext {
+        // download the weight and config using HubApi
+        // load the base configuration
+        // using the typeRegistry create a model (random weights)
+        // load the weights, apply quantization as needed, update the model
+            // calls model.sanitize() for weight preparation
+        // load the tokenizer
+        // (vlm) load the processor configuration, create the processor
+    }
+}
+```
+
+Callers with specialized requirements can use these individual components to manually
+load models, if needed.
+
+## Evaluation Flow
+
+- Load the Model
+- UserInput
+- LMInput
+- generate()
+    - NaiveStreamingDetokenizer
+    - TokenIterator
+
+## Evaluating a Model
+
+Once a model is loaded you can evaluate a prompt or series of
+messages.  Minimally you need to prepare the user input:
+
+```swift
+let prompt = "Describe the image in English"
+var input = UserInput(prompt: prompt, images: image.map { .url($0) })
+input.processing.resize = .init(width: 256, height: 256)
+```
+
+This example shows adding some images and processing instructions -- if
+model accepts text only then these parts can be omitted.  The inference
+calls are the same.
+
+Assuming you are using a `ModelContainer` (an actor that holds
+a `ModelContext`, which is the bundled set of types that implement a
+model), the first step is to convert the `UserInput` into the
+`LMInput` (LanguageModel Input):
+
+```swift
+let generateParameters: GenerateParameters
+let input: UserInput
+
+let result = try await modelContainer.perform { [input] context in
+    let input = try context.processor.prepare(input: input)
+
+```
+
+Given that `input` we can call `generate()` to produce a stream
+of tokens.  In this example we use a `NaiveStreamingDetokenizer`
+to assist in converting a stream of tokens into text and print it.
+The stream is stopped after we hit a maximum number of tokens:
+
+```
+    var detokenizer = NaiveStreamingDetokenizer(tokenizer: context.tokenizer)
+
+    return try MLXLMCommon.generate(
+        input: input, parameters: generateParameters, context: context
+    ) { tokens in
+
+        if let last = tokens.last {
+            detokenizer.append(token: last)
+        }
+
+        if let new = detokenizer.next() {
+            print(new, terminator: "")
+            fflush(stdout)
+        }
+
+        if tokens.count >= maxTokens {
+            return .stop
+        } else {
+            return .more
+        }
+    }
+}
+```
diff --git a/Libraries/MLXLLM/Models/Gemma2.swift b/Libraries/MLXLLM/Models/Gemma2.swift
@@ -155,7 +155,7 @@ private class TransformerBlock: Module {
 }
 
 // Uses Gemma2TransformerBlock, otherwise same as GemmaModelInner
-public class ModelInner: Module {
+private class ModelInner: Module {
     @ModuleInfo(key: "embed_tokens") var embedTokens: Embedding
 
     fileprivate let layers: [TransformerBlock]
@@ -197,7 +197,7 @@ public class Gemma2Model: Module, LLMModel, KVCacheDimensionProvider {
     public let vocabularySize: Int
     public let kvHeads: [Int]
 
-    let model: ModelInner
+    private let model: ModelInner
     let logitSoftCap: Float
 
     public init(_ args: Gemma2Configuration) {
diff --git a/Libraries/MLXLLM/Models/Llama.swift b/Libraries/MLXLLM/Models/Llama.swift
@@ -283,6 +283,7 @@ private class LlamaModelInner: Module {
     }
 }
 
+/// Model for Llama and Mistral model types.
 public class LlamaModel: Module, LLMModel, KVCacheDimensionProvider {
 
     public let vocabularySize: Int
diff --git a/Libraries/MLXLLM/Models/Phi3.swift b/Libraries/MLXLLM/Models/Phi3.swift
@@ -151,7 +151,7 @@ private class TransformerBlock: Module {
     }
 }
 
-public class Phi3ModelInner: Module {
+private class Phi3ModelInner: Module {
 
     @ModuleInfo(key: "embed_tokens") var embedTokens: Embedding
 
@@ -189,7 +189,7 @@ public class Phi3Model: Module, LLMModel, KVCacheDimensionProvider {
     public let vocabularySize: Int
     public let kvHeads: [Int]
 
-    let model: Phi3ModelInner
+    private let model: Phi3ModelInner
 
     @ModuleInfo(key: "lm_head") var lmHead: Linear
 
diff --git a/Libraries/MLXLLM/Models/Qwen2.swift b/Libraries/MLXLLM/Models/Qwen2.swift
@@ -133,7 +133,7 @@ private class TransformerBlock: Module {
     }
 }
 
-public class Qwen2ModelInner: Module {
+private class Qwen2ModelInner: Module {
     @ModuleInfo(key: "embed_tokens") var embedTokens: Embedding
 
     fileprivate let layers: [TransformerBlock]
@@ -169,7 +169,7 @@ public class Qwen2Model: Module, LLMModel, KVCacheDimensionProvider {
     public let vocabularySize: Int
     public let kvHeads: [Int]
 
-    let model: Qwen2ModelInner
+    private let model: Qwen2ModelInner
     let configuration: Qwen2Configuration
 
     @ModuleInfo(key: "lm_head") var lmHead: Linear
diff --git a/Libraries/MLXLLM/Models/Starcoder2.swift b/Libraries/MLXLLM/Models/Starcoder2.swift
@@ -116,7 +116,7 @@ private class TransformerBlock: Module {
     }
 }
 
-public class Starcoder2ModelInner: Module {
+private class Starcoder2ModelInner: Module {
     @ModuleInfo(key: "embed_tokens") var embedTokens: Embedding
 
     fileprivate let layers: [TransformerBlock]
@@ -153,7 +153,7 @@ public class Starcoder2Model: Module, LLMModel, KVCacheDimensionProvider {
     public let kvHeads: [Int]
 
     public let tieWordEmbeddings: Bool
-    let model: Starcoder2ModelInner
+    private let model: Starcoder2ModelInner
 
     @ModuleInfo(key: "lm_head") var lmHead: Linear
 
diff --git a/Libraries/MLXLMCommon/Documentation.docc/Documentation.md b/Libraries/MLXLMCommon/Documentation.docc/Documentation.md
@@ -0,0 +1,13 @@
+# ``MLXLMCommon``
+
+Common language model code.
+
+## Other MLX Libraries Packages
+
+- [MLXEmbedders](MLXEmbedders)
+- [MLXLLM](MLXLLM)
+- [MLXLMCommon](MLXLMCommon)
+- [MLXMNIST](MLXMNIST)
+- [MLXVLM](MLXVLM)
+- [StableDiffusion](StableDiffusion)
+
diff --git a/README.md b/README.md
@@ -1,3 +1,14 @@
+# Documentation
+
+Developers can use these examples in their own programs -- just import the swift package!
+
+- [MLXLLMCommon](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxlmcommon) -- common API for LLM and VLM
+- [MLXLLM](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxllm) -- large language model example implementations
+- [MLXVLM](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxvlm) -- visual language model example implementations
+- [MLXEmbedders](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxembedders) -- popular Encoders / Embedding models example implementations
+- [StableDiffusion](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/stablediffusion) -- SDXL Turbo and Stable Diffusion mdeol example implementations
+- [MLXMNIST](https://swiftpackageindex.com/ml-explore/mlx-swift-examples/main/documentation/mlxmnist) -- MNIST implementation for all your digit recognition needs
+
 # MLX Swift Examples
 
 Example [MLX Swift](https://github.com/ml-explore/mlx-swift) programs.

Original file line number	Diff line number	Diff line change
`@@ -283,6 +283,7 @@ private class LlamaModelInner: Module {`
`283`	`283`	`}`
`284`	`284`	`}`
`285`	`285`
	`286`	`+/// Model for Llama and Mistral model types.`
`286`	`287`	`public class LlamaModel: Module, LLMModel, KVCacheDimensionProvider {`
`287`	`288`
`288`	`289`	`public let vocabularySize: Int`
Original file line number	Diff line number	Diff line change
`@@ -151,7 +151,7 @@ private class TransformerBlock: Module {`
`151`	`151`	`}`
`152`	`152`	`}`
`153`	`153`
`154`		`-public class Phi3ModelInner: Module {`
	`154`	`+private class Phi3ModelInner: Module {`
`155`	`155`
`156`	`156`	`@ModuleInfo(key: "embed_tokens") var embedTokens: Embedding`
`157`	`157`
`@@ -189,7 +189,7 @@ public class Phi3Model: Module, LLMModel, KVCacheDimensionProvider {`
`189`	`189`	`public let vocabularySize: Int`
`190`	`190`	`public let kvHeads: [Int]`
`191`	`191`
`192`		`- let model: Phi3ModelInner`
	`192`	`+ private let model: Phi3ModelInner`
`193`	`193`
`194`	`194`	`@ModuleInfo(key: "lm_head") var lmHead: Linear`
`195`	`195`
Original file line number	Diff line number	Diff line change
`@@ -133,7 +133,7 @@ private class TransformerBlock: Module {`
`133`	`133`	`}`
`134`	`134`	`}`
`135`	`135`
`136`		`-public class Qwen2ModelInner: Module {`
	`136`	`+private class Qwen2ModelInner: Module {`
`137`	`137`	`@ModuleInfo(key: "embed_tokens") var embedTokens: Embedding`
`138`	`138`
`139`	`139`	`fileprivate let layers: [TransformerBlock]`
`@@ -169,7 +169,7 @@ public class Qwen2Model: Module, LLMModel, KVCacheDimensionProvider {`
`169`	`169`	`public let vocabularySize: Int`
`170`	`170`	`public let kvHeads: [Int]`
`171`	`171`
`172`		`- let model: Qwen2ModelInner`
	`172`	`+ private let model: Qwen2ModelInner`
`173`	`173`	`let configuration: Qwen2Configuration`
`174`	`174`
`175`	`175`	`@ModuleInfo(key: "lm_head") var lmHead: Linear`
Original file line number	Diff line number	Diff line change
`@@ -116,7 +116,7 @@ private class TransformerBlock: Module {`
`116`	`116`	`}`
`117`	`117`	`}`
`118`	`118`
`119`		`-public class Starcoder2ModelInner: Module {`
	`119`	`+private class Starcoder2ModelInner: Module {`
`120`	`120`	`@ModuleInfo(key: "embed_tokens") var embedTokens: Embedding`
`121`	`121`
`122`	`122`	`fileprivate let layers: [TransformerBlock]`
`@@ -153,7 +153,7 @@ public class Starcoder2Model: Module, LLMModel, KVCacheDimensionProvider {`
`153`	`153`	`public let kvHeads: [Int]`
`154`	`154`
`155`	`155`	`public let tieWordEmbeddings: Bool`
`156`		`- let model: Starcoder2ModelInner`
	`156`	`+ private let model: Starcoder2ModelInner`
`157`	`157`
`158`	`158`	`@ModuleInfo(key: "lm_head") var lmHead: Linear`
`159`	`159`