Model Explainability #1445

PaulSZH95 · 2023-08-04T01:41:04Z

PaulSZH95
Aug 4, 2023

May I check with you, for the vad tutorial provided in the repo, 2s chunks were used for training. Could I check if the resolution is 16ms?

I am trying to visualize the effect of the sincnet layer and am attempting to extract the sincnet ParamSincFB layer. However, the VAD pipeline for inference breaks for this purpose and I am trying to replicate the Pipeline method with torchaudio.load instead. Could I check if I should use 16ms or 2 s.

PaulSZH95 · 2023-08-04T01:56:14Z

PaulSZH95
Aug 4, 2023
Author

Hi @hbredin much apologies for the tag as the issue seems more related to product knowledge than technical know-hows. In the supplementary information, the model summary is as follows and suggest that the model ingests 2s chunks directly. If so, could I check at what stage does the mili-second segmentation starts?:

Truly much thanks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Explainability #1445

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Model Explainability #1445

Uh oh!

Uh oh!

PaulSZH95 Aug 4, 2023

Replies: 1 comment

Uh oh!

PaulSZH95 Aug 4, 2023 Author

PaulSZH95
Aug 4, 2023

PaulSZH95
Aug 4, 2023
Author