Skip to content

Releases: ngxson/wllama

2.3.2

06 Jun 21:18
367be2f
Compare
Choose a tag to compare

News

Important

🚀 This release marks a special event:

Firefox now official uses wllama as one of the inference engine in their Link Preview feature!

The Link Preview feature is currently available on Beta and Nightly build. You can find the upstream code here.

Read more in this blog: https://blog.mozilla.org/en/mozilla/ai/ai-tech/ai-link-previews-firefox/

image

What's Changed

  • v2.3.2 (sync with upstream llama.cpp) by @ngxson in #179

Full Changelog: 2.3.1...2.3.2

2.3.1

18 Apr 08:24
e4bd5e7
Compare
Choose a tag to compare

What's Changed

  • sync with upstream llama.cpp source code by @ngxson in #171

Full Changelog: 2.3.0...2.3.1

2.3.0

13 Mar 14:35
Compare
Choose a tag to compare

What's Changed

You can now use the stream: true option to get an AsyncIterator:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const stream = await wllama.createChatCompletion(messages, {
  nPredict: 10,
  sampling: {
    temp: 0.0,
  },
  stream: true, // ADD THIS
});

for await (const chunk of stream) {
  console.log(chunk.currentText);
}

Additionally, you can also use AbortSignal to stop a generation mid-way, much like how it's used in fetch API. Here is an example:

const abortController = new AbortController();
const stream = await wllama.createChatCompletion(messages, {
  abortSignal: abortController.signal, // ADD THIS
  stream: true,
});

// call abortController.abort(); to abort it
// note: this can also be called during prompt processing

Gemma 3 support: With the up-to-date llama.cpp source code, you can now use Gemma 3 models!


  • build single-file mjs + minified version by @ngxson in #161
  • bump to latest upstream llama.cpp source code by @ngxson in #162
  • add support for async generator by @ngxson in #163
  • add "stream" option for AsyncIterator by @ngxson in #164
  • add test for abortSignal by @ngxson in #165
  • bump to latest upstream llama.cpp source code by @ngxson in #166

Full Changelog: 2.2.1...2.3.0

2.2.1

01 Mar 20:06
Compare
Choose a tag to compare

What's Changed

Full Changelog: 2.2.0...2.2.1

2.2.0

08 Feb 23:21
d72123c
Compare
Choose a tag to compare

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

  • x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
  • Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

image

What's Changed

  • add benchmark function, used internally by @ngxson in #151
  • switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
  • Remove json.hpp dependency by @ngxson in #155
  • temporary apply that viral x2 speedup PR by @ngxson in #156
  • Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0

2.1.4

30 Jan 16:34
e05af9e
Compare
Choose a tag to compare

Nothing new, but I messed up the version number, so have to push a new one to fix it.

2.1.3

22 Jan 14:39
e05af9e
Compare
Choose a tag to compare

What's Changed

  • Sync with upsteam source code, add demo for DeepSeek-R1 by @ngxson in #150

Try it via the demo app: https://huggingface.co/spaces/ngxson/wllama

image

Full Changelog: 2.1.2...2.1.3

2.1.2

12 Jan 14:11
30adc2a
Compare
Choose a tag to compare

What's Changed

  • sync with upstream llama.cpp source code by @ngxson in #147

Full Changelog: 2.1.1...2.1.2

2.1.1

23 Dec 14:56
86138c8
Compare
Choose a tag to compare

What's Changed

  • sync to latest upstream source code by @ngxson in #145

Full Changelog: 2.1.0...2.1.1

2.1.0

06 Dec 17:49
ee31d9f
Compare
Choose a tag to compare

What's Changed

  • added createChatCompletion --> #140

Example:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const completion = await wllama.createChatCompletion(messages, {
  nPredict: 10,
});