Releases · AD2605/llama.cpp

09 Jul 16:51

26a48ad

b5854 Latest

Latest

ggml : prevent integer overflow in gguf tensor size calculation (#14595)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-07-09T16:51:14Z
llama-b5854-bin-macos-arm64.zip

sha256:5fe8088e34e231ee3c55be51775b849d10f405f3ae6ce173e27716b500a57f05

10.6 MB 2025-07-09T16:51:24Z
llama-b5854-bin-macos-x64.zip

sha256:96c7da24eacde7a88b9856cb4ffc045b38bed0f5e4fa91529d8d83f5855780d7

26.4 MB 2025-07-09T16:51:24Z
llama-b5854-bin-ubuntu-vulkan-x64.zip

sha256:f8daa373159265f01fbe5d1ea680a25fcdab9bb5b66bf72170af78043c3bda5b

20.2 MB 2025-07-09T16:51:26Z
llama-b5854-bin-ubuntu-x64.zip

sha256:af474501cd723b98d8bee0b130b1b100b7e7eb219b55e005460c1619ff94c486

12.4 MB 2025-07-09T16:51:27Z
llama-b5854-bin-win-cpu-arm64.zip

sha256:65549a6aec082b40d9ae3899ab1ab19ab2980ea1fe6b643a034a9db149c41791

10.8 MB 2025-07-09T16:51:28Z
llama-b5854-bin-win-cpu-x64.zip

sha256:25471123816cc64963ccd214dcad13726ad4ecf87dda6404bc890cf0b6440a9d

13.6 MB 2025-07-09T16:51:29Z
llama-b5854-bin-win-cuda-12.4-x64.zip

sha256:fccf0e8ca04cd3c8ac0bd9d0ae50017dd793065122fb5cf7c618e6265c992d14

129 MB 2025-07-09T16:51:30Z
llama-b5854-bin-win-hip-radeon-x64.zip

sha256:4ccee807c4a17015e9ed57f4ec870139ecb487c87e8a2c9058df0652cc6db287

298 MB 2025-07-09T16:51:34Z
llama-b5854-bin-win-opencl-adreno-arm64.zip

sha256:da4d7af14b768b3475bdffe65368e45b26f6997e8538d73a95901db05cc9a099

11.2 MB 2025-07-09T16:51:42Z
Source code (zip)

2025-07-09T12:33:53Z
Source code (tar.gz)

2025-07-09T12:33:53Z

01 Jul 10:21

github-actions

b5795

343b6e9

b5795

CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (#14411)

* [CANN]update to aclnnGroupedMatmulV2

Signed-off-by: noemotiovon <757486878@qq.com>

* Support MUL_MAT_ID on 310p

Signed-off-by: noemotiovon <757486878@qq.com>

* fix editorconfig

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

30 Jun 22:29

github-actions

b5787

0a5a3b5

b5787

Add Conv2d for CPU (#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

Assets 15

25 Jun 12:14

github-actions

b5753

73e53dc

b5753

opencl: ref count `ggml_backend_opencl_context` and refactor profilin…

Assets 15

20 Jun 10:04

github-actions

b5716

d27b3ca

b5716

ggml : fix repack work size for mul_mat_id (#14292)

ggml-ci

Assets 15

17 Jun 17:01

github-actions

b5688

860a9e4

b5688

ggml-cpu : remove the weak alias trick (#14221)

Assets 15

09 Jun 12:37

github-actions

b5611

dc0623f

b5611

webui: fix sidebar being covered by main content (#14082)

* webui: fix sidebar being covered by main content

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* webui: update index.html.gz

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Assets 15

28 May 09:40

github-actions

b5518

26b79b6

b5518

convert : fix tensor naming conflict for llama 4 vision (#13836)

* convert : fix tensor naming conflict for llama 4 vision

* add comment

Assets 18

27 May 10:29

github-actions

b5503

f9cd683

b5503

sampling : make sure samplers return at least 1 token (#13822)

* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci

Assets 18

23 May 16:36

github-actions

b5467

8a2afb7

b5467

llama : allow custom list of swa_layers (#13726)

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: AD2605/llama.cpp

b5854

Uh oh!

b5795

Uh oh!

b5787

Uh oh!

b5753

Uh oh!

b5716

Uh oh!

b5688

Uh oh!

b5611

Uh oh!

b5518

Uh oh!

b5503

Uh oh!

b5467

Uh oh!