Releases · shalinib-ibm/llama.cpp

14 Jul 09:25

0d92267

b5891 Latest

Latest

llama : add jinja template for rwkv-world (#14665)

* llama : add jinja template for rwkv-world

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-07-14T09:25:05Z
llama-b5891-bin-macos-arm64.zip

sha256:04fff17f2d8951881fa20b3868cab856e700e3638a0ba05b6eb286af37b6c2de

10.6 MB 2025-07-14T09:25:17Z
llama-b5891-bin-macos-x64.zip

sha256:668722504a31e1d166e2d626fe9a6fbf6180c9761fcaa37550de52c542e32c16

26.4 MB 2025-07-14T09:25:18Z
llama-b5891-bin-ubuntu-vulkan-x64.zip

sha256:f798485154619d8beb35d5a656a71df0faf730e44ae2e13c77a7b3ff341856b9

20.8 MB 2025-07-14T09:25:20Z
llama-b5891-bin-ubuntu-x64.zip

sha256:9bc0c27e78a2c7d752565206e04f2b7bfdf46f6d1803f889d13b4ee48a048527

12.4 MB 2025-07-14T09:25:21Z
llama-b5891-bin-win-cpu-arm64.zip

sha256:f755b9096c9711366ac2a43a912bb47cd83194d41f3d1422eb2d59e8d2f7babd

10.8 MB 2025-07-14T09:25:22Z
llama-b5891-bin-win-cpu-x64.zip

sha256:aef936945d08c88f88c71fbf3b9cb41ff8cf71a172290a94e31c9108b25403a4

13.6 MB 2025-07-14T09:25:23Z
llama-b5891-bin-win-cuda-12.4-x64.zip

sha256:48a9b39ca25e112f1b845f06eaef4f715e5dfbefd262b4685329404db11b11bc

129 MB 2025-07-14T09:25:24Z
llama-b5891-bin-win-hip-radeon-x64.zip

sha256:419a3425b998788a81d6a127623c47139ac5201ae15418680fb1931a9e70686c

298 MB 2025-07-14T09:25:28Z
llama-b5891-bin-win-opencl-adreno-arm64.zip

sha256:775ee2509a64268537592b3616342430d2b554c6d6feb346761590aebfe548c4

11.2 MB 2025-07-14T09:25:38Z
Source code (zip)

2025-07-13T23:43:43Z
Source code (tar.gz)

2025-07-13T23:43:43Z

20 May 06:31

github-actions

b5429

e298d2f

b5429

kv-cache : add SWA support (#13194)

* kv-cache : prepare for SWA

ggml-ci

* kv-cache : initial iSWA implementation

ggml-ci

* kv-cache : rework error recovery logic

ggml-ci

* models : fix Phi-3 SWA parameters

ggml-ci

* model : adjust Granite to rope factor changes

ggml-ci

* server : check if context can do shifts

ggml-ci

* iswa : for now, always enable shifts (experiment)

ggml-ci

* kv-cache : simplify SWA logic

ggml-ci

* kv-cache : apply defrag when we fail to find slots for the batch

ggml-ci

* llama : update docs about llama_decode

ggml-ci

* kv-cache : update warning logs when no space for the batch is available

ggml-ci

* llama : add llama_kv_self_seq_pos_min()

* kv-cache : keep track of partial SWA computes and print warnings

* server : disallow use cases involving partial SWA context

ggml-ci

* llama : add param to control SWA cache size

ggml-ci

* minor : clean-up

ggml-ci

Assets 20

16 May 06:57

github-actions

b5401

bc098c3

b5401

minja: sync (qwen3) (#13573)

* minja: sync https://github.com/google/minja/commit/f06140fa52fd140fe38e531ec373d8dc9c86aa06

- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58

---------

Co-authored-by: ochafik <ochafik@google.com>

Assets 20

29 Apr 10:36

github-actions

b5218

00e3e5a

b5218

mtmd : add qwen2vl and qwen2.5vl (#13141)

* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics

Assets 26

29 Apr 09:12

github-actions

b5216

b6ce743

b5216

llama-graph : fix text position for mrope (#13159)

* llama-graph : fix text position for mrope

* fix typo

* explicitly set 4th dim in the loop

Assets 26

28 Apr 09:41

github-actions

b5204

e5d6c25

b5204

llama-chat : fix typo GML --> GLM (#13143)

Assets 26

28 Apr 06:15

github-actions

b5201

85f36e5

b5201

arg : fix unused variable (#13142)

Assets 26

28 Apr 05:04

github-actions

b5200

c0a97b7

b5200

llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: shalinib-ibm/llama.cpp

b5891

Uh oh!

b5429

Uh oh!

b5401

Uh oh!

b5218

Uh oh!

b5216

Uh oh!

b5204

Uh oh!

b5201

Uh oh!

b5200

Uh oh!