How to interpret Pyannet results #1454
Closed
PaulSZH95
started this conversation in
Development
Replies: 1 comment 1 reply
-
https://herve.niderb.fr/posts/2022-10-23-One-speaker-segmentation-model-to-rule-them-all.html |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am using pyannet for voice activity detection. All code used is the same as the voice_activity_detection notebook given in tutorials.
My question:
I am observing that the inference class gives probability for each 17 ms of and audio.
However, the inference.py of the pyannote repo sets default duration of each chunk to 2 seconds and thus self.step of 0.1 * duration gives 0.2 seconds.
May I know how a sliding window of length 2s with 0.2s step to offer prediction per 0.17ms frame of the audio.
I have taken a look at the inference.py script and couldn't quite figure out the missing link.
Much thanks for any help
Beta Was this translation helpful? Give feedback.
All reactions