Skip to content

Releases: tinglou/llama.cpp

b4779

26 Feb 03:37
d7cfe1f
Compare
Choose a tag to compare
docs: add docs/function-calling.md to lighten server/README.md's plig…

b4776

25 Feb 12:34
c132239
Compare
Choose a tag to compare
add OP sigmoid (#12056)

Co-authored-by: Judd <foldl@boxvest.com>

b4769

25 Feb 01:56
34a846b
Compare
Choose a tag to compare
opencl: fix for small models (#11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com>
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>

b4764

24 Feb 03:32
7ad0779
Compare
Choose a tag to compare
run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041)

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

b4735

22 Feb 11:32
02203f7
Compare
Choose a tag to compare
Apply suggestions from code review

b4734

21 Feb 08:14
Compare
Choose a tag to compare
Merge branch 'master' of github.com:tinglou/llama.cpp

b4732

17 Feb 09:38
2eea03d
Compare
Choose a tag to compare
vulkan: implement several ops relevant for ggml_opt (#11769)

* vulkan: support memset_tensor

* vulkan: support GGML_OP_SUM

* vulkan: implement GGML_OP_ARGMAX

* vulkan: implement GGML_OP_SUB

* vulkan: implement GGML_OP_COUNT_EQUAL

* vulkan: implement GGML_OP_OPT_STEP_ADAMW

* vulkan: fix check_results RWKV_WKV6 crash and memory leaks

* vulkan: implement GGML_OP_REPEAT_BACK

* tests: remove invalid test-backend-ops REPEAT_BACK tests

* vulkan: fix COUNT_EQUAL memset using a fillBuffer command

b4705

13 Feb 08:16
27e8a23
Compare
Choose a tag to compare
sampling: add Top-nσ sampler (#11223)

* initial sampling changes:

* completed top nsigma sampler implementation

* apply parameter to only llama-cli

* updated readme

* added tests and fixed nsigma impl

* cleaned up pr

* format

* format

* format

* removed commented tests

* cleanup pr and remove explicit floats

* added top-k sampler to improve performance

* changed sigma to float

* fixed string format to float

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* added llama_sampler_init

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

b4677

10 Feb 03:32
19d3c82
Compare
Choose a tag to compare
There's a better way of clearing lines (#11756)

Use the ANSI escape code for clearing a line.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>

b4539

24 Jan 03:46
564804b
Compare
Choose a tag to compare
tests: fix some mul_mat test gaps (#11375)

Now that we have batched mat-vec mul Vulkan shaders for up to n==8,
these tests weren't actually exercising the mat-mat mul path. Test
n==9 as well. Also, change to use all_types.