Skip to content

Garbled output from CLI and TypeError in WebLLM for converted granite-3.3-8b-instruct model #720

@okeribok

Description

@okeribok

Summary

When converting the ibm-granite/granite-3.3-8b-instruct model using the llama model type, the process completes successfully, but the resulting model is unusable. The native CLI chat produces garbled, nonsensical text, and attempting to load the compiled WASM in WebLLM consistently fails with a TypeError, pointing to a fundamental incompatibility. This suggests that while the Granite architecture is "Llama-like," it has subtle but significant differences that the Llama recipe cannot handle correctly.

Model Used

Steps to Reproduce

  1. Convert Weights: Convert the model using the llama model type.

    # (Paths are illustrative)
    mlc_llm convert_weight ./path/to/source/granite-3.3-8b-instruct/ \
      --model-type llama \
      --quantization q4f16_1 \
      -o ./dist/models/dist/granite-3.3-8b-instruct-q4f16_1/
  2. Test with Native CLI Chat: Run the converted model on the command line.

    mlc_llm chat ./dist/models/dist/granite-3.3-8b-instruct-q4f16_1/ --device cpu --overrides context_window_size=4096

Observed Behavior

The CLI chat loads the model successfully but produces garbled, nonsensical output upon receiving a prompt.

Example Output:

>>> ahoy matey! sing me the shanty of your people!
- (ing dis S will { as a self is's21・ and - ( the-  (的B-side a_ a (- is and (':y/ ``  â\ie1 -[j.- s . King-e\iz. ( (. is-u) them is (5- c (
 }  :,;: (io,
$$-{ and\ was- C-a - was,Ai.-info
 ^{정

WebLLM Test Case (Minimal Reproducible Example)

To isolate the issue, I created a minimal HTML file to load the compiled WASM. This test also fails, proving the issue is not with the example apps.

test.html code:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>WebLLM Granite WASM Test</title>
</head>
<body>
  <h1>Testing Granite WASM model</h1>
  <p>Open the browser's Developer Console (Ctrl+Shift+I) to see the output.</p>
  <script type="module">
    import { CreateMLCEngine } from "https://esm.run/@mlc-ai/web-llm@0.2.79";

    async function runTest() {
      console.log("Starting test...");
      const appConfig = {
        model_list: [
          {
            "model": "path/to/your/dist/granite-3.3-8b-instruct-q4f16_1/",
            "model_id": "Granite-v1-8b-instruct",
            "model_lib": "path/to/your/dist/granite-3.3-8b-instruct-q4f16_1/granite-3.3-8b-instruct-webgpu.wasm",
            "required_features": ["shader-f16"],
            "tokenizer_files": [
              "tokenizer.json", "vocab.json", "merges.txt", "added_tokens.json"
            ]
          }
        ]
      };

      console.log("Initializing engine with custom config...");
      const engine = await CreateMLCEngine(
        "Granite-v1-8b-instruct",
        { appConfig: appConfig }
      );

      console.log("Engine initialized. Sending a test prompt...");
      const reply = await engine.chat.completions.create({
        messages: [{ role: "user", content: "Hello! Who are you?" }],
      });
      console.log("Reply:", reply.choices[0].message);
    }

    runTest().catch(err => {
      console.error("WebLLM test failed:", err);
    });
  </script>
</body>
</html>

Observed WebLLM Error:
This test consistently fails with TypeError: Failed to construct 'URL': Invalid URL, indicating an issue deep inside the library when processing the model's configuration or files.

Diagnosis

The combination of garbled native output and a fundamental TypeError in WebLLM strongly suggests that the Llama recipe is not a suitable proxy for the Granite architecture. This popular model family from IBM would likely need its own dedicated recipe in MLC-LLM to function correctly. I have tried this on Linux, Windows, and WSL with the same results.

Thank you for your work on this incredible project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions