Recognising the same speaker across multiple audio clips? #5699

PhantomSpike · 2022-12-23T15:02:33Z

PhantomSpike
Dec 23, 2022

Hi everyone,

Thank you to the developers for this wonderful software, really love it! <3

Quick Q:

Is it possible to recognize the same speaker across multiple sound clips/recordings?

If so, should I concatenate all the recordings together and feed them in as one file, or I can give them separately to the NeMo diarization pipeline?

If not possible, what would you recommend? I was thinking that I can use the pipeline on the sound clips, and then once I have the embeddings for each speaker I can do some sort of clustering to find the same speakers because presumably in the embedding space they would be quite similar to one another even in different recordings.

Just to give a bit more context, my particular application will be meeting calls and telephone calls, and we would not have a database of speakers, but I would want to just be able to say that the same person was present in different recordings and when they spoke (diarization).

Any advice would be greatly appreciated! Thank you!

titu1994 · 2022-12-29T02:59:29Z

titu1994
Dec 29, 2022
Maintainer

@tango4j

1 reply

PhantomSpike Dec 31, 2022
Author

Thank you for tagging the relevant person who might be able to help. Any advice/help would be appreciated!

Rumeysakeskin · 2023-01-24T07:21:18Z

Rumeysakeskin
Jan 24, 2023

Hello, @PhantomSpike,

You can utilize speaker verification models for this. Speaker verification verifies a person's identity from voice characteristics across multiple speakers, regardless of language. I present three NeMo speaker verification models from NVIDIA NeMo (SpeakerNet, TitaNet-L, ECAPA-TDNN) in my repository. I hope this will help you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recognising the same speaker across multiple audio clips? #5699

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recognising the same speaker across multiple audio clips? #5699

Uh oh!

PhantomSpike Dec 23, 2022

Replies: 2 comments · 1 reply

Uh oh!

titu1994 Dec 29, 2022 Maintainer

Uh oh!

PhantomSpike Dec 31, 2022 Author

Uh oh!

Rumeysakeskin Jan 24, 2023

PhantomSpike
Dec 23, 2022

Replies: 2 comments 1 reply

titu1994
Dec 29, 2022
Maintainer

PhantomSpike Dec 31, 2022
Author

Rumeysakeskin
Jan 24, 2023