[Bug Report] Real-Time Streaming Diarization (diarize=true) Consistently Fails for Mono Channel Audio #1333

yigitburakakkas · 2025-07-13T13:10:34Z

yigitburakakkas
Jul 13, 2025

Hello Deepgram Team,

We are writing to report a persistent issue with the speaker diarization feature in your real-time streaming STT service. Despite enabling the diarize parameter, the service consistently fails to distinguish between different speakers in a single mono channel audio stream.

As an enterprise customer, this functionality is crucial for our AI Voice Agent product, and we have been unable to get it to work over the last three weeks of testing. (We were not using this feature before, first time we tried 3 weeks ago.)

Problem Description

When using the real-time streaming service with the diarize=true parameter on a mono audio stream containing two distinct, non-overlapping speakers, we expect the transcription to return different speaker labels (e.g., Speaker 0, Speaker 1).

However, the actual result is that all transcribed words are consistently tagged with speaker: 0, regardless of which person is speaking. The service never identifies a second speaker.

Our Environment and Use Case

Service: Real-Time Streaming STT
Models Tested: nova-3 and nova-2
Client: Python implementation using a direct socket-based approach.
Audio Format: Mono channel, 16000 Hz sample rate, PCM encoding.
Specific Use Case: Our users interact with a voice agent. Sometimes, a second person near the user might also speak. This results in an audio stream with two distinct voices (e.g., one male, one female) speaking at different times within the same mono channel.

Steps to Reproduce

We are confident this issue can be reproduced easily by your team:

Obtain Test Audio: Select any mono, 16kHz audio file that contains two clearly different speakers talking at separate, non-overlapping times.
Establish Streaming Connection: Use a socket-based client (like the examples in your documentation) to connect to the real-time streaming endpoint.
Set Parameters: In the connection URL or initial message, configure the service to use a model like nova-2 or nova-3 and, most importantly, set diarize=true.
Stream Audio: Send the audio data to the service chunk-by-chunk to mimic a live microphone feed.
Observe Results: Monitor the JSON responses. You will observe that the speaker property within the words array is always 0, even when the second person is speaking.

Troubleshooting We Have Performed

We have thoroughly investigated this on our end to rule out a client-side issue. Our observations are:

Persistent & Continuous: This is not a temporary bug or a brief service degradation. We have been testing this consistently for the past three weeks with the same result every time.
Chunk Size Independent: We initially suspected our small chunk size (30ms) might be the issue. To rule this out, we experimented with a wide range of chunk sizes, from 30 milliseconds up to 10 seconds. The issue persisted across all tested chunk sizes, even unrealistically large ones.
Not Audio-Specific: This problem is not isolated to a single audio file. We have tested with numerous different audio samples containing multiple speakers, and the result is always the same.

Regarding a request_id

Because this is an issue with the real-time streaming service observed over countless sessions and not a single, atomic API call, we do not have a specific request_id to provide. The bug lies in the continuous processing logic of the diarization feature itself. We believe the detailed reproduction steps above will be sufficient for your team to investigate and confirm the behavior internally.
We would appreciate it if you could look into this issue. Please let us know if you need any more information from our side.
Thank you for your time and support.

Related past issues from other users:
https://github.com/orgs/deepgram/discussions/1298 #1298
https://github.com/orgs/deepgram/discussions/1144 #1144
https://github.com/orgs/deepgram/discussions/1068 #1068

2025-07-13T13:10:36Z

deepgram-community[bot]
bot Jul 13, 2025

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
_{Consider joining our Discord community for more opportunity to engage with your fellow Deepgram users. You can earn points which can be redeemed for cool stuff by being active in our communities!}

0 replies

2025-07-13T13:10:56Z

deepgram-community[bot]
bot Jul 13, 2025

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

yigitburakakkas · 2025-07-13T13:10:58Z

deepgram-community[bot]
bot Jul 13, 2025

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

A request ID that triggered your error or issue.

1 reply

yigitburakakkas Jul 13, 2025
Author

9089e272-8000-43b9-942e-9e4916bbd40c
db3e1e3a-8c85-4ba6-afb3-e4807b0a1315
f3ab10d1-9d0c-440d-b299-5ff7924275b2

These are the 3 differen session samples, each of them are include 2 speakers with different timing during the session. Thank you for your quick responses.

yigitburakakkas · 2025-07-14T23:29:41Z

deepgram-community[bot]
bot Jul 14, 2025

Thank you for reaching out with such a thorough breakdown of the issue you are experiencing with the speaker Diarization feature. . I appreciate the detailed reproduction steps and troubleshooting efforts you've undertaken.

At this time, your request did not show any errors in the logs, indicating that the feature is likely not performing well on your audio.

This unfortunately is a known issue with Diarization and is one we've been intending to improve for some time though I don't have an ETA to when we release a better version of Diarization.

In the mean time: This Doc https://developers.deepgram.com/docs/multichannel-vs-diarization

Might give you some ideas to improve the results using Multichannel with Diarization which can help tremendously it just requires a multichannel audio approach.

This message was sent by John Vajda from Deepgram, via our community automation.

1 reply

yigitburakakkas Jul 20, 2025
Author

Hi John,

Thank you for the quick and transparent response. We appreciate you looking into our request IDs and confirming that this is a known issue with the current diarization model's performance.

Regarding the suggestion to use multichannel audio, I wanted to clarify our use case further. The audio we process is inherently single-channel, originating from our end-users' devices. The problem we're trying to solve is isolating our primary user's voice from background speech or device echo, all of which are captured within that single mono channel. Therefore, generating a multichannel stream isn't a feasible workaround for us.

Given this, a high-performing native diarization feature is the most direct solution to improve our service's accuracy. While we understand there is no current ETA for an improved model, we would be very grateful if you could please keep this discussion ticket open as a tracker for this specific issue.

When your team makes any progress or releases an update to the diarization model, could you please post a notification here? As our initial message outlined, this is a critical feature, and allowing this ticket to serve as a central point for updates would be immensely helpful, not just for our team but likely for other customers who are waiting for this functionality to work as expected.

Thanks again for your help and transparency.

Best regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

[Bug Report] Real-Time Streaming Diarization (diarize=true) Consistently Fails for Mono Channel Audio #1333

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deepgram

[Bug Report] Real-Time Streaming Diarization (diarize=true) Consistently Fails for Mono Channel Audio #1333

Uh oh!

yigitburakakkas Jul 13, 2025

Problem Description

Our Environment and Use Case

Steps to Reproduce

Troubleshooting We Have Performed

Regarding a request_id

Replies: 4 comments · 2 replies

Uh oh!

deepgram-community[bot] bot Jul 13, 2025

Uh oh!

deepgram-community[bot] bot Jul 13, 2025

Uh oh!

deepgram-community[bot] bot Jul 13, 2025

Uh oh!

yigitburakakkas Jul 13, 2025 Author

Uh oh!

deepgram-community[bot] bot Jul 14, 2025

Uh oh!

yigitburakakkas Jul 20, 2025 Author

yigitburakakkas
Jul 13, 2025

Replies: 4 comments 2 replies

deepgram-community[bot]
bot Jul 13, 2025

deepgram-community[bot]
bot Jul 13, 2025

deepgram-community[bot]
bot Jul 13, 2025

yigitburakakkas Jul 13, 2025
Author

deepgram-community[bot]
bot Jul 14, 2025

yigitburakakkas Jul 20, 2025
Author