Replies: 1 comment
-
Just a heads up for anyone looking into this issue: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I'm currently experiencing some issues with hallucination, similar to what others have encountered. I'm looking for a solution to generate subtitles for a one-hour long video. I just tried some configuration, but with its limits.
According to:
Issue #896,
Pull Request #291
Whisper Thread
Context of the video:
At the start of the video, there is music playing for about 10 minutes, followed by a speech.
Custom Settings:
Beam size: 5 (-bs 5)
Entropy threshold: 2.4 (-et 2.4)
Maximum context: 64 (max-context = 64)
With this configuration, the hallucination is now limited and "only" takes 2 minutes to find the way back. Previously, I had about 60 minutes of the word "[Music]" before making the adjustments.
However, after approximately 64 spoken words, the context changes, and the model starts working fine again. But there is still around 2 minutes of hallucination during the start of the speech. Is there a way to implement a time threshold (in seconds) to establish a new context after 10-15 seconds? Or reset the context, if the temperature is on high level for x seconds?
Further can someone explain the variables? As it might help reducing hallucinations?
--word-thold N [0.01 ] word timestamp probability threshold
--entropy-thold N [2.40 ] entropy threshold for decoder fail
-logprob-thold N [-1.00 ] log probability threshold for decoder fail
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions