Replies: 1 comment
-
@slaren Using build 'b5404', I am encountering the same issue with: [user@system]$ export LLAMA_ARG_HF_REPO=nomic-ai/nomic-embed-text-v2-moe-GGUF:Q4_K_M \
LLAMA_ARG_EMBEDDINGS=1 \
LLAMA_ARG_ENDPOINT_METRICS=1 \
LLAMA_ARG_NO_WEBUI=1 \
LLAMA_ARG_HOST=0.0.0.0 \
LLAMA_ARG_N_PARALLEL=4 \
LLAMA_ARG_ALIAS=embeddings-multilingual \
LLAMA_ARG_PORT=80 \
LLAMA_ARG_CACHE_TYPE_K=f16 \
LLAMA_ARG_FLASH_ATTN=0 \
LLAMA_ARG_CTX_SIZE=2048 \
LLAMA_ARG_BATCH=448 \
LLAMA_ARG_BATCH=512 \
LLAMA_ARG_THREADS=1 \
LLAMA_ARG_N_PREDICT=-1 \
LLAMA_ARG_N_GPU_LAYERS=0 \
LLAMA_ARG_NUMA=distribute \
LLAMA_ARG_MLOCK=0 \
LLAMA_ARG_ENDPOINT_SLOTS=1 \
LLAMA_ARG_NO_CONTEXT_SHIFT=0 \
LLAMA_ARG_UBATCH=512
[user@system]$ llama-server --seed 0 --temp 0.0 Full logs
Note: It is not deterministic, but it seems to happen more frequently when enough slots are used. If wanting to reproduce, you should reduce |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
not sure if I should submit an issue about this:
getting GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == CLS") failed and GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed after introducing Pooling.
I'm using the java wrapper tho, https://github.com/kherud/java-llama.cpp
only reason I think the issue is in llama.cpp is Ollama user mentioned the same issue here: ollama/ollama#4545
Beta Was this translation helpful? Give feedback.
All reactions