[Bug Report] Real-Time Streaming Diarization (diarize=true) Consistently Fails for Mono Channel Audio #1333
Replies: 4 comments 2 replies
-
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Thank you for reaching out with such a thorough breakdown of the issue you are experiencing with the speaker Diarization feature. . I appreciate the detailed reproduction steps and troubleshooting efforts you've undertaken. At this time, your request did not show any errors in the logs, indicating that the feature is likely not performing well on your audio. This unfortunately is a known issue with Diarization and is one we've been intending to improve for some time though I don't have an ETA to when we release a better version of Diarization. In the mean time: This Doc https://developers.deepgram.com/docs/multichannel-vs-diarization Might give you some ideas to improve the results using Multichannel with Diarization which can help tremendously it just requires a multichannel audio approach.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Deepgram Team,
We are writing to report a persistent issue with the speaker diarization feature in your real-time streaming STT service. Despite enabling the diarize parameter, the service consistently fails to distinguish between different speakers in a single mono channel audio stream.
As an enterprise customer, this functionality is crucial for our AI Voice Agent product, and we have been unable to get it to work over the last three weeks of testing. (We were not using this feature before, first time we tried 3 weeks ago.)
Problem Description
When using the real-time streaming service with the diarize=true parameter on a mono audio stream containing two distinct, non-overlapping speakers, we expect the transcription to return different speaker labels (e.g., Speaker 0, Speaker 1).
However, the actual result is that all transcribed words are consistently tagged with speaker: 0, regardless of which person is speaking. The service never identifies a second speaker.
Our Environment and Use Case
Steps to Reproduce
We are confident this issue can be reproduced easily by your team:
Troubleshooting We Have Performed
We have thoroughly investigated this on our end to rule out a client-side issue. Our observations are:
Regarding a request_id
Because this is an issue with the real-time streaming service observed over countless sessions and not a single, atomic API call, we do not have a specific request_id to provide. The bug lies in the continuous processing logic of the diarization feature itself. We believe the detailed reproduction steps above will be sufficient for your team to investigate and confirm the behavior internally.
We would appreciate it if you could look into this issue. Please let us know if you need any more information from our side.
Thank you for your time and support.
Related past issues from other users:
https://github.com/orgs/deepgram/discussions/1298 #1298
https://github.com/orgs/deepgram/discussions/1144 #1144
https://github.com/orgs/deepgram/discussions/1068 #1068
Beta Was this translation helpful? Give feedback.
All reactions