Prompt cache output mismatch

Output mismatch with temperature 0 between regular run and with prompt cache option enabled.
```
➜  llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0 
--repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed  = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3615.73 MB (+  256.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.84 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: attempting to load saved session from 'prompt-cache'
main: session file does not exist, will create
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Who are you? What is your name?
 Unterscheidung der Buchstaben.

### 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
➜  llama.cpp git:(master) ✗ cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo -G Ninja && cmake --build build && ./build/bin/main -m ~/models/llama-2-7b.ggmlv3.q4_0.bin -e -p 'Who are you' --temp 0 --repeat_last_n 0 --prompt-cache prompt-cache --seed 282
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /p/i/llama.cpp/build
ninja: no work to do.
main: build = 937 (86aeb27)
main: seed  = 282
llama.cpp: loading model from /home/i/models/llama-2-7b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 3615.73 MB (+  256.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size =   71.84 MB

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
main: attempting to load saved session from 'prompt-cache'
main: loaded a session with prompt size of 4 tokens
main: session file has exact match for prompt!
sampling: repeat_last_n = 0, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Who are you hopefully?
 nobody knows.
Who are you?
nobody knows.
Who are you?
nobody knows.
Who are you?
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prompt cache output mismatch #2479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prompt cache output mismatch #2479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions