You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a collection of utilities to help adopt language models in Swift apps. It tries to follow the Python `transformers` API and abstractions whenever possible, but it also aims to provide an idiomatic Swift interface and does not assume prior familiarity with [`transformers`](https://github.com/huggingface/transformers) or [`tokenizers`](https://github.com/huggingface/tokenizers).
6
+
This is a collection of utilities to help adopt language models in Swift apps.
7
7
8
+
It tries to follow the Python `transformers` API and abstractions whenever possible, but it also aims to provide an idiomatic Swift interface and does not assume prior familiarity with [`transformers`](https://github.com/huggingface/transformers) or [`tokenizers`](https://github.com/huggingface/tokenizers).
Check out [our announcement post](https://huggingface.co/blog/swift-coreml-llm).
14
14
15
-
-`Tokenizers`. Utilities to convert text to tokens and back. Follows the abstractions in [`tokenizers`](https://github.com/huggingface/tokenizers) and [`transformers.js`](https://github.com/xenova/transformers.js). Usage example:
15
+
## Modules
16
16
17
+
-`Tokenizers`: Utilities to convert text to tokens and back, with support for Chat Templates and Tools. Follows the abstractions in [`tokenizers`](https://github.com/huggingface/tokenizers). Usage example:
17
18
```swift
18
19
importTokenizers
19
-
20
20
functestTokenizer() asyncthrows {
21
-
let tokenizer =tryawait AutoTokenizer.from(pretrained: "pcuenq/Llama-2-7b-chat-coreml")
22
-
let inputIds =tokenizer("Today she took a train to the West")
let tokenizer =tryawait AutoTokenizer.from(pretrained: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
22
+
let messages = [["role":"user", "content":"Describe the Swift programming language."]]
23
+
let encoded =try tokenizer.applyChatTemplate(messages: messages)
24
+
let decoded = tokenizer.decode(tokens: encoded)
24
25
}
25
26
```
26
27
27
-
However, you don't usually need to tokenize the input text yourself - the [`Generation` code](https://github.com/huggingface/swift-transformers/blob/17d4bfae3598482fc7ecf1a621aa77ab586d379a/Sources/Generation/Generation.swift#L82) will take care of it.
28
-
29
-
-`Hub`. Utilities to download configuration files from the Hub, used to instantiate tokenizers and learn about language model characteristics.
30
-
31
-
-`Generation`. Algorithms for text generation. Currently supported ones are greedy search and top-k sampling.
32
-
33
-
-`Models`. Language model abstraction over a Core ML package.
28
+
You don't usually need to tokenize the input text yourself - the [`Generation` code](https://github.com/huggingface/swift-transformers/blob/17d4bfae3598482fc7ecf1a621aa77ab586d379a/Sources/Generation/Generation.swift#L82) will take care of it.
34
29
30
+
-`Hub`: Utilities to download config files from the Hub, used to instantiate tokenizers and learn about language model characteristics.
31
+
-`Generation`: Algorithms for text generation. Currently supported ones are greedy search and top-k sampling.
32
+
-`Models`: Language model abstraction over a Core ML package.
35
33
36
34
## Supported Models
37
35
@@ -45,19 +43,13 @@ This package has been tested with autoregressive language models such as:
45
43
46
44
Encoder-decoder models such as T5 and Flan are currently _not supported_. They are high up in our [priority list](#roadmap).
47
45
48
-
## Other Tools
49
-
50
-
-[`swift-chat`](https://github.com/huggingface/swift-chat), a simple app demonstrating how to use this package.
51
-
-[`exporters`](https://github.com/huggingface/exporters), a Core ML conversion package for transformers models, based on Apple's [`coremltools`](https://github.com/apple/coremltools).
52
-
-[`transformers-to-coreml`](https://huggingface.co/spaces/coreml-projects/transformers-to-coreml), a no-code Core ML conversion tool built on `exporters`.
53
-
54
-
## SwiftPM
46
+
## Usage via SwiftPM
55
47
56
48
To use `swift-transformers` with SwiftPM, you can add this to your `Package.swift`:
-[ ] Tokenizers: download from the Hub, port from [`tokenizers`](https://github.com/huggingface/tokenizers)
80
-
-[x] BPE family
81
-
-[x] Fix Falcon, broken while porting BPE
82
-
-[x] Improve tests, add edge cases, see https://github.com/xenova/transformers.js/blob/27920d84831e323275b38f0b5186644b7936e1a2/tests/generate_tests.py#L24
83
-
-[x] Include fallback `tokenizer_config.json` for known architectures whose models don't have a configuration in the Hub (GPT2)
84
-
-[ ] Port other tokenizer types: Unigram, WordPiece
85
-
-[ ][`exporters`](https://github.com/huggingface/exporters) – Core ML conversion tool.
86
-
-[x] Allow max sequence length to be specified.
87
-
-[ ] Allow discrete shapes
88
-
-[x] Return `logits` from converted Core ML model
89
-
-[x] Use `coremltools` @ `main` for latest fixes. In particular, [this merged PR](https://github.com/apple/coremltools/pull/1915) makes it easier to use recent versions of transformers.
90
-
-[ ] Generation
91
-
-[ ] Nucleus sampling (we currently have greedy and top-k sampling)
92
-
-[ ] Use [new `top-k` implementation in `Accelerate`](https://developer.apple.com/documentation/accelerate/bnns#4164142).
93
-
-[ ] Support discrete shapes in the underlying Core ML model by selecting the smallest sequence length larger than the input.
-[ ] Test a code model (to stretch system prompt definition)
69
+
## Other Tools
70
+
71
+
-[`swift-chat`](https://github.com/huggingface/swift-chat), a simple app demonstrating how to use this package.
72
+
-[`exporters`](https://github.com/huggingface/exporters), a Core ML conversion package for transformers models, based on Apple's [`coremltools`](https://github.com/apple/coremltools).
73
+
-[`transformers-to-coreml`](https://huggingface.co/spaces/coreml-projects/transformers-to-coreml), a no-code Core ML conversion tool built on `exporters`.
0 commit comments