Replies: 1 comment 3 replies
-
UP |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using the embedding example, the execution parameters are as follows
embedding.exe -ngl 200000 -m I:\JYGAIBIN\MetaLlamaModel\Llama2-13b-chat\ggml-model-f32_q4_1.gguf --log-disable -p "Hello World!"
The first three embedding values are output when the CPU executes the embedding
-4.67528416e-08
-1.07059577e-06
1.76811977e-06
The first three embedding values are output when the GPU (-ngl 200000) executes the embedding
5.86615059e-08
-1.02221782e-06
1.78800110e-06
Why are the same "Hello World!" inputs different? Does llama.cpp currently correctly support GPU and CPU embedding?
Also, does llama.cpp have specific instructions for underlying API functions, or usage precautions? In addition to those on github, is there any interface documentation website? Thank you
Possible Answer
I think for the same input content, the GPU and CPU output embedding values should be the same
Which of these two results is correct? Do GPUs currently support computational embedding and fine-tuning?
Beta Was this translation helpful? Give feedback.
All reactions