### Context Using latest 17e98d4c96a583d420f12046bc92102381dbd28e llama.cpp server. Server started with a llama70b-F16 like model: ```shell server \ --model model-f16.gguf \ --ctx-size 32768 \ --n-predict 4096 \ --parallel 32 \ --n-gpu-layers 81 \ --batch-size 4096 \ --ubatch-size 256 \ --metrics \ --mg 1 \ --log-format text \ --defrag-thold 0.1 ``` When sending 32 concurrent requests, the server crashes with: `GGML_ASSERT: /llama.cpp/ggml.c:16521: i != GGML_HASHTABLE_FULL` Backend is CUDA, on 2 A100, compute capability 80. EDIT: The issue is related with defragmentation, quick fix: disable defragmentation