You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PS D:\llama.cpp\release\llama-b5468-bin-win-cuda-12.4-x64> ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5080, compute capability 12.0, VMM: yes
load_backend: loaded CUDA backend from D:\llama.cpp\release\llama-b5468-bin-win-cuda-12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from D:\llama.cpp\release\llama-b5468-bin-win-cuda-12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from D:\llama.cpp\release\llama-b5468-bin-win-cuda-12.4-x64\ggml-cpu-alderlake.dll
version: 5468 (d13d0f6)
built with clang version 18.1.8 for x86_64-pc-windows-msvc
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Starting from commit b5434 and onward, setting the -np (or --n-parallel) parameter greater than 1 in llama.cpp causes the model to generate repetitive outputs — such as endlessly repeating characters like '=' or '3' — after a certain number of tokens have been decoded.