Batch generation has landed: https://github.com/ggerganov/llama.cpp/pull/3228 This should make our test suite ~10x faster on GGUF models.