-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Use miniaudio for direct decoding flac, mp3, ogg and wav #2759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's very unexpected that there's no reaction at all. Doesn't anyone really want it? |
If it can be integrated into the shared library without any additional features, I really want this feature, because if I use ffmpeg it will be a problem because of its size. |
Yes, it is. All OS-specific sound I/O functions are disabled. |
This looks amazing. Any chance of getting this to work on macos? |
I can't see any problems.
|
Does master need to be merged into this? It looks many commits behind. I'd love to try building off of this branch. |
Please try now. |
Thanks! This worked great! I was able to directly decode a @ggerganov Any chance of getting this reviewed? |
(FYI I ran this branch on a M1 MBP running the current version of macos.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job and sorry for the delay. We can use this in the examples, but for now I don't plan to add audio decoding directly into whisper.cpp
. It seems better to leave this to the user.
Let's resolve the CI and merge. Later on, if we can completely remove SDL dependency, it would be great. But I am not sure how difficult it would be.
I hope that I've properly corrected all remarks. |
I think https://github.com/mackron/miniaudio/blob/master/examples/simple_capture.c can be taken as template for audio recording. |
4d32ac7
to
c5f65e1
Compare
miniaudio 0.11.22 was released, so I've updated this PR. |
I get why libopus isn't included..read the threads. But still way over my head trying to included it in the souce code for whisper.cpp and custom compile miniaudio with opus support.
|
cool |
FYI @amanda |
This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (ggml-org#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: ggml-org#3092
This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: #3092
miniaudio.h
taken from mackron/miniaudio@6d5efde. Andstb_vorbis.c
from nothings/stb@5c20573.I think
miniaudio
can be used in many other purposes in the future.Note: tested in Linux only.
Thank you for the awesome project!