You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there! I'll start by saying that I love this application and appreciate the sweat and tears that've gone into its development.
One idea to consider: several of the speech-to-text models accept additional parameters with the speech payload -- could there be an "Advanced" area or similar in the API key page where users can set these parameters?
For example, the ElevenLabs Speech-to-Text (“Scribe v1”) endpoint now exposes three parameters that can dramatically improve multi-speaker transcripts:
diarize – boolean, default false. When enabled the service tags each word with a speaker_id, letting us attribute dialogue to individual speakers. Could be configured as an on/off toggle
num_speakers -- integer, 1-32. The maximum amount of speakers talking in the uploaded file. Can help with predicting who speaks when. The maximum amount of speakers that can be predicted is 32.
diarization_threshold – float 0.1 – 0.4. Higher values merge similar voices (fewer speakers); lower values split them (more speakers). This is honored only when diarize=true and num_speakers is unset; if omitted, ElevenLabs falls back to about 0.22. Could be conf
I think a careful review of the various model providers could unearth a few useful parameters.
A potential downside to this change is the need to continually maintain a working set of parameters. But that's much more work than you'll already need to invest to maintain compatibility with the API syntax for each model provider.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there! I'll start by saying that I love this application and appreciate the sweat and tears that've gone into its development.
One idea to consider: several of the speech-to-text models accept additional parameters with the speech payload -- could there be an "Advanced" area or similar in the API key page where users can set these parameters?
For example, the ElevenLabs Speech-to-Text (“Scribe v1”) endpoint now exposes three parameters that can dramatically improve multi-speaker transcripts:
ElevenLabs API References - Speech-to-Text
I think a careful review of the various model providers could unearth a few useful parameters.
A potential downside to this change is the need to continually maintain a working set of parameters. But that's much more work than you'll already need to invest to maintain compatibility with the API syntax for each model provider.
Beta Was this translation helpful? Give feedback.
All reactions