-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
When converting the ibm-granite/granite-3.3-8b-instruct
model using the llama
model type, the process completes successfully, but the resulting model is unusable. The native CLI chat produces garbled, nonsensical text, and attempting to load the compiled WASM in WebLLM consistently fails with a TypeError
, pointing to a fundamental incompatibility. This suggests that while the Granite architecture is "Llama-like," it has subtle but significant differences that the Llama recipe cannot handle correctly.
Model Used
- Hugging Face Repository: [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) (or similar source)
Steps to Reproduce
-
Convert Weights: Convert the model using the
llama
model type.# (Paths are illustrative) mlc_llm convert_weight ./path/to/source/granite-3.3-8b-instruct/ \ --model-type llama \ --quantization q4f16_1 \ -o ./dist/models/dist/granite-3.3-8b-instruct-q4f16_1/
-
Test with Native CLI Chat: Run the converted model on the command line.
mlc_llm chat ./dist/models/dist/granite-3.3-8b-instruct-q4f16_1/ --device cpu --overrides context_window_size=4096
Observed Behavior
The CLI chat loads the model successfully but produces garbled, nonsensical output upon receiving a prompt.
Example Output:
>>> ahoy matey! sing me the shanty of your people!
- (ing dis S will { as a self is's21・ and - ( the- (的B-side a_ a (- is and (':y/ `` â\ie1 -[j.- s . King-e\iz. ( (. is-u) them is (5- c (
} :,;: (io,
$$-{ and\ was- C-a - was,Ai.-info
^{정
WebLLM Test Case (Minimal Reproducible Example)
To isolate the issue, I created a minimal HTML file to load the compiled WASM. This test also fails, proving the issue is not with the example apps.
test.html
code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>WebLLM Granite WASM Test</title>
</head>
<body>
<h1>Testing Granite WASM model</h1>
<p>Open the browser's Developer Console (Ctrl+Shift+I) to see the output.</p>
<script type="module">
import { CreateMLCEngine } from "https://esm.run/@mlc-ai/web-llm@0.2.79";
async function runTest() {
console.log("Starting test...");
const appConfig = {
model_list: [
{
"model": "path/to/your/dist/granite-3.3-8b-instruct-q4f16_1/",
"model_id": "Granite-v1-8b-instruct",
"model_lib": "path/to/your/dist/granite-3.3-8b-instruct-q4f16_1/granite-3.3-8b-instruct-webgpu.wasm",
"required_features": ["shader-f16"],
"tokenizer_files": [
"tokenizer.json", "vocab.json", "merges.txt", "added_tokens.json"
]
}
]
};
console.log("Initializing engine with custom config...");
const engine = await CreateMLCEngine(
"Granite-v1-8b-instruct",
{ appConfig: appConfig }
);
console.log("Engine initialized. Sending a test prompt...");
const reply = await engine.chat.completions.create({
messages: [{ role: "user", content: "Hello! Who are you?" }],
});
console.log("Reply:", reply.choices[0].message);
}
runTest().catch(err => {
console.error("WebLLM test failed:", err);
});
</script>
</body>
</html>
Observed WebLLM Error:
This test consistently fails with TypeError: Failed to construct 'URL': Invalid URL
, indicating an issue deep inside the library when processing the model's configuration or files.
Diagnosis
The combination of garbled native output and a fundamental TypeError
in WebLLM strongly suggests that the Llama recipe is not a suitable proxy for the Granite architecture. This popular model family from IBM would likely need its own dedicated recipe in MLC-LLM to function correctly. I have tried this on Linux, Windows, and WSL with the same results.
Thank you for your work on this incredible project!