Skip to content

can I enable "-dl" option and detect multiple languages (in one audio-clip)? Thanks! #3018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
minounou opened this issue Apr 8, 2025 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@minounou
Copy link

minounou commented Apr 8, 2025

Hi,

I could run whisper-cli to detect language and it works great (as the following):
whisper-cli -f ./jfk.wav -dl --model ./ggml-small.bin

it works great and I could get the auto-detected language (like the following)::

...
whisper_full_with_state: auto-detected language: en (p = 0.958143)
...

my question is: can I also use the same "-dl" option to detect a wav-file which has multiple languages? in this case will whisper-cli show 2 language-types (similar as the above, but show 2 language-types)? Thanks!

@ggerganov
Copy link
Member

whisper-cli does not have this specific function available, but it is something that can be very easily implemented:

// Use mel data at offset_ms to try and auto-detect the spoken language
// Make sure to call whisper_pcm_to_mel() or whisper_set_mel() first
// Returns the top language id or negative on failure
// If not null, fills the lang_probs array with the probabilities of all languages
// The array must be whisper_lang_max_id() + 1 in size
// ref: https://github.com/openai/whisper/blob/main/whisper/decoding.py#L18-L69
WHISPER_API int whisper_lang_auto_detect(
struct whisper_context * ctx,
int offset_ms,
int n_threads,
float * lang_probs);

PRs welcome.

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers labels Apr 9, 2025
@minounou
Copy link
Author

minounou commented Apr 9, 2025

Thanks for the info! I am trying to create a PR and is the following my understanding correct? Thanks!
in here:

WHISPER_LOG_INFO("%s: auto-detected language: %s (p = %f)\n", __func__, params.language, probs[whisper_lang_id(params.language)]);

PR will output more probs if (for example) there is a group of probs whose values are much bigger than others, then output all these bigger probs (is this correct?):

        WHISPER_LOG_INFO("%s: auto-detected language: %s (p = %f)\n", __func__, languages[#1], probs[#1]);
        WHISPER_LOG_INFO("%s: auto-detected language: %s (p = %f)\n", __func__, languages[#2], probs[#2]);
         ...
        WHISPER_LOG_INFO("%s: auto-detected language: %s (p = %f)\n", __func__, languages[#k], probs[#k]);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants