Releases: tinglou/llama.cpp
Releases · tinglou/llama.cpp
b4779
docs: add docs/function-calling.md to lighten server/README.md's plig…
b4776
add OP sigmoid (#12056) Co-authored-by: Judd <foldl@boxvest.com>
b4769
opencl: fix for small models (#11950) * opencl: fix small shape gemv, remove unused extensions * opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size * opencl: fix for token length < 4 * opencl: use wave size of 64 for all Adreno GPUs --------- Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com> Co-authored-by: Skyler Szot <quic_sszot@quicinc.com>
b4764
run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041) Signed-off-by: Florent Benoit <fbenoit@redhat.com>
b4735
Apply suggestions from code review
b4734
Merge branch 'master' of github.com:tinglou/llama.cpp
b4732
vulkan: implement several ops relevant for ggml_opt (#11769) * vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command
b4705
sampling: add Top-nσ sampler (#11223) * initial sampling changes: * completed top nsigma sampler implementation * apply parameter to only llama-cli * updated readme * added tests and fixed nsigma impl * cleaned up pr * format * format * format * removed commented tests * cleanup pr and remove explicit floats * added top-k sampler to improve performance * changed sigma to float * fixed string format to float * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update src/llama-sampling.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added llama_sampler_init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
b4677
There's a better way of clearing lines (#11756) Use the ANSI escape code for clearing a line. Signed-off-by: Eric Curtin <ecurtin@redhat.com>
b4539
tests: fix some mul_mat test gaps (#11375) Now that we have batched mat-vec mul Vulkan shaders for up to n==8, these tests weren't actually exercising the mat-mat mul path. Test n==9 as well. Also, change to use all_types.