-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Description
🐛 Describe the bug
When using different generation configurations, such as top_k=1 or temperature=0 (while keeping other settings unchanged), why do the generated results change? They should both correspond to a deterministic greedy decoding.
vllm 0.4.3
Supplement:
The main issue encountered here is that the results generated by setting the temperature coefficient to 0 or topk to 1 are different. I understand that due to operator optimization and the lack of conventional arithmetic properties in floating-point numbers, matrix operations have a certain randomness. However, the sampling process occurs after the hidden_state is generated, at which point no calculations are involved. Therefore, the sampling results of the two sampling parameters should be the same.