Success calculating embeddings with Continue from continue.dev? #12879
Unanswered
bitbottrap
asked this question in
Q&A
Replies: 1 comment 1 reply
-
See #6722 (comment) Short answer is:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to use llama.cpp to locally calculate embeddings locally using an embedding model.
I'm not having much success in that llama.cpp keeps crashing. With a small code base I can get by indexing and then have a problem with the @codebase queries and larger codebases usually fail attempting to index the code base. Two prominent errors:
llama-graph.cpp:171: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
and:
My continue embedding provider configuration is:
And my llama-server command line is:
LLAMA_LOG_VERBOSITY=1 CUDA_VISIBLE_DEVICES=0 llama.cpp/build/bin/llama-server -a gte-qwen1.5B --host 0.0.0.0 --port 8081 --no-warmup --pooling mean --threads 4 -np 128 -b 8192 -ub 1024 -ngl 99 -c 262144 --flash-attn --embedding -m gte-Qwen2-1.5B-instruct.gguf
I'm sure continue hasn't been thoroughly tested in this configuration but the crashes are definitely bugging me.
Anyone have success?
Beta Was this translation helpful? Give feedback.
All reactions