Releases · ngxson/wllama

06 Jun 21:18

ngxson

2.3.2

367be2f

2.3.2 Latest

Latest

News

Important

🚀 This release marks a special event:

Firefox now official uses wllama as one of the inference engine in their Link Preview feature!

The Link Preview feature is currently available on Beta and Nightly build. You can find the upstream code here.

What's Changed

v2.3.2 (sync with upstream llama.cpp) by @ngxson in #179

Full Changelog: 2.3.1...2.3.2

Contributors

ngxson

Assets 2

18 Apr 08:24

ngxson

2.3.1

e4bd5e7

2.3.1

What's Changed

sync with upstream llama.cpp source code by @ngxson in #171

Full Changelog: 2.3.0...2.3.1

Contributors

ngxson

Assets 2

13 Mar 14:35

ngxson

2.3.0

96f5a11

2.3.0

What's Changed

You can now use the stream: true option to get an AsyncIterator:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const stream = await wllama.createChatCompletion(messages, {
  nPredict: 10,
  sampling: {
    temp: 0.0,
  },
  stream: true, // ADD THIS
});

for await (const chunk of stream) {
  console.log(chunk.currentText);
}

Additionally, you can also use AbortSignal to stop a generation mid-way, much like how it's used in fetch API. Here is an example:

const abortController = new AbortController();
const stream = await wllama.createChatCompletion(messages, {
  abortSignal: abortController.signal, // ADD THIS
  stream: true,
});

// call abortController.abort(); to abort it
// note: this can also be called during prompt processing

Gemma 3 support: With the up-to-date llama.cpp source code, you can now use Gemma 3 models!

build single-file mjs + minified version by @ngxson in #161
bump to latest upstream llama.cpp source code by @ngxson in #162
add support for async generator by @ngxson in #163
add "stream" option for AsyncIterator by @ngxson in #164
add test for abortSignal by @ngxson in #165
bump to latest upstream llama.cpp source code by @ngxson in #166

Full Changelog: 2.2.1...2.3.0

Contributors

ngxson

Assets 2

01 Mar 20:06

ngxson

2.2.1

675200b

2.2.1

What's Changed

update emsdk to 4.0.3 by @ngxson in #158
sync with latest upsteam source code by @ngxson in #159

Full Changelog: 2.2.0...2.2.1

Contributors

ngxson

Assets 2

08 Feb 23:21

ngxson

2.2.0

d72123c

2.2.0

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

What's Changed

add benchmark function, used internally by @ngxson in #151
switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
Remove json.hpp dependency by @ngxson in #155
temporary apply that viral x2 speedup PR by @ngxson in #156
Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0

Contributors

ngxson

Assets 2

30 Jan 16:34

ngxson

2.1.4

e05af9e

2.1.4

Nothing new, but I messed up the version number, so have to push a new one to fix it.

Assets 2

22 Jan 14:39

ngxson

2.1.3

e05af9e

2.1.3

What's Changed

Sync with upsteam source code, add demo for DeepSeek-R1 by @ngxson in #150

Try it via the demo app: https://huggingface.co/spaces/ngxson/wllama

Full Changelog: 2.1.2...2.1.3

Contributors

ngxson

Assets 2

12 Jan 14:11

ngxson

2.1.2

30adc2a

2.1.2

What's Changed

sync with upstream llama.cpp source code by @ngxson in #147

Full Changelog: 2.1.1...2.1.2

Contributors

ngxson

Assets 2

23 Dec 14:56

ngxson

2.1.1

86138c8

2.1.1

What's Changed

sync to latest upstream source code by @ngxson in #145

Full Changelog: 2.1.0...2.1.1

Contributors

ngxson

Assets 2

06 Dec 17:49

ngxson

2.1.0

ee31d9f

2.1.0

What's Changed

added createChatCompletion --> #140

Example:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const completion = await wllama.createChatCompletion(messages, {
  nPredict: 10,
});

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

News

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

Debut at FOSDEM 2025

What's Changed

Contributors

Uh oh!

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Uh oh!

Releases: ngxson/wllama

2.3.2

News

What's Changed

Contributors

Uh oh!

2.3.1

What's Changed

Contributors

Uh oh!

2.3.0

What's Changed

Contributors

Uh oh!

2.2.1

What's Changed

Contributors

Uh oh!

2.2.0

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

Debut at FOSDEM 2025

What's Changed

Contributors

Uh oh!

2.1.4

Uh oh!

2.1.3

What's Changed

Contributors

Uh oh!

2.1.2

What's Changed

Contributors

Uh oh!

2.1.1

What's Changed

Contributors

Uh oh!

2.1.0

What's Changed

Uh oh!