Skip to content

Commit 03d86ac

Browse files
authored
Use fallback config if class not defined (#53)
Fixes distilgpt2 tokenization. Previously, we only used the fallback configuration if there was no `tokenizer_config.json` in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR: huggingface/transformers#29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes. We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.
1 parent bbbd7bf commit 03d86ac

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

Sources/Hub/Hub.swift

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,14 @@ public class LanguageModelConfigurationFromHub {
130130
// Try to guess the class if it's not present and the modelType is
131131
if let _ = hubConfig.tokenizerClass?.stringValue { return hubConfig }
132132
guard let modelType = try await modelType else { return hubConfig }
133+
134+
// If the config exists but doesn't contain a tokenizerClass, use a fallback config if we have it
135+
if let fallbackConfig = Self.fallbackTokenizerConfig(for: modelType) {
136+
let configuration = fallbackConfig.dictionary.merging(hubConfig.dictionary, uniquingKeysWith: { current, _ in current })
137+
return Config(configuration)
138+
}
139+
140+
// Guess by capitalizing
133141
var configuration = hubConfig.dictionary
134142
configuration["tokenizer_class"] = "\(modelType.capitalized)Tokenizer"
135143
return Config(configuration)

0 commit comments

Comments
 (0)