[sabia-3] What are possible reasons for non reproducibility? #119
Replies: 3 comments 1 reply
-
Hi @deniseiras! I tried running 50 generations with temperature=0.0, and I couldn't reproduce the variation in output. Are you experiencing variations with fewer samples than that? However, it's important to note that there's inherent non-determinism when dealing with GPUs and batched computations, so we can't guarantee 100% reproducibility. Nonetheless, the differences when using temperature=0.0 are expected to be small. import maritalk
from tqdm import trange
model = maritalk.MariTalk(
key="112214802853319356013_eadffed51934d199",
model="sabia-3"
)
outputs = []
for _ in trange(50):
response = model.generate(
"Escreva um texto sobre a importância da água para a vida humana.",
temperature=0.0,
max_tokens=150,
)
outputs.append(response["answer"])
unique_outputs = set(outputs)
print(len(unique_outputs)) # 1
print(unique_outputs) # {'A água é um recurso natural essencial para a vida humana e para a manutenção do equilíbrio dos ecossistemas no planeta Terra. Ela é o principal componente dos organismos vivos, representando cerca de 60% do peso corporal de um adulto, e é fundamental para uma série de processos biológicos, como a regulação da temperatura corporal, o transporte de nutrientes e a eliminação de resíduos.\n\nAlém de ser vital para o consumo humano, a água é indispensável para a produção de alimentos, tanto na irrigação das lavouras quanto na criação de animais. A agricultura é, de fato, o'} |
Beta Was this translation helpful? Give feedback.
-
Hi Denis! We were able to reproduce the issue you described. The problem is that, during decoding, 2 tokens might have very similar probabilities, and due to numerical imprecisions, one token might be selected over the other. That is, the result for two identical prompts might be different, even when using temperature=0.0. This issue you are seeing here, we believe, also happens with other APIs, such as OpenAI's: Unfortunately, we don't know of any solution to this problem at the moment, but please let us know if this is an impediment to using our API. If so, we will discuss it deeper internally to see how we can mitigate this problem. |
Beta Was this translation helpful? Give feedback.
-
Hi Rodrigo. Yes, it occurs also with openai, but less frequently. I
understand problems related to precision and I know its a hard issue to
deal with. I wonder that this kind of issue could lead to not using gen AI
for some tools that need to have allways the same results. Maybe you can
implement a solution that rounds the output considering an number with less
precision, but I think this could lead to worst results. It could be a
parameter to the user. Just wondering some possible solutions...
Thanks a lot for your time
Cheers
Denis
Other problem i did not relate is that the output format I suggested
sometimes are incorrect.
Em sáb., 14 de set. de 2024 10:08, Rodrigo Nogueira <
***@***.***> escreveu:
… Hi Denis!
We were able to reproduce the issue you described.
The problem is that, during decoding, 2 tokens might have very similar
probabilities, and due to numerical imprecisions, one token might be
selected over the other. That is, the result for two identical prompts
might be different, even when using temperature=0.0.
This issue you are seeing here, we believe, also happens with other APIs,
such as OpenAI's:
https://community.openai.com/t/observing-discrepancy-in-completions-with-temperature-0/73380/3
Unfortunately, we don't know of any solution to this problem at the
moment, but please let us know if this is an impediment to using our API.
If so, we will discuss it deeper internally to see how we can mitigate this
problem.
—
Reply to this email directly, view it on GitHub
<#119 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOUS2FELXFPQTCNOU6LLILZWQYL7AVCNFSM6AAAAABOB3Z7JOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANRUGY2DMOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I ran two times the same prompt, starting a new session in each execution.
I used the code in here to call the API: https://github.com/deniseiras/PORTFOLIO_py_maritaca_api
Even in the first call of the get_completion method (reusing the model) I get different results.
I am using Sabia-3 API and a prompt with:
Temperature = 0
Prompt tokens: 2912
Response tokens: 1631
PS: I would't like to share my prompt due to a private project.
Beta Was this translation helpful? Give feedback.
All reactions