Use miniaudio for direct decoding flac, mp3, ogg and wav #2759

data-man · 2025-01-23T17:42:56Z

miniaudio.h taken from mackron/miniaudio@6d5efde. And stb_vorbis.c from nothings/stb@5c20573.
I think miniaudio can be used in many other purposes in the future.
Note: tested in Linux only.

Thank you for the awesome project!

data-man · 2025-01-31T23:02:02Z

It's very unexpected that there's no reaction at all. Doesn't anyone really want it?

azkadev · 2025-02-05T08:04:36Z

It's very unexpected that there's no reaction at all. Doesn't anyone really want it?

If it can be integrated into the shared library without any additional features, I really want this feature, because if I use ffmpeg it will be a problem because of its size.

data-man · 2025-02-06T06:52:06Z

If it can be integrated into the shared library without any additional features

Yes, it is. All OS-specific sound I/O functions are disabled.
But they can be used in the future instead of SDL.

satmandu · 2025-02-06T14:51:19Z

This looks amazing. Any chance of getting this to work on macos?

data-man · 2025-02-06T15:20:49Z

This looks amazing. Any chance of getting this to work on macos?

I can't see any problems.
https://github.com/mackron/miniaudio?tab=readme-ov-file#supported-platforms

Windows
macOS, iOS
Linux
FreeBSD / OpenBSD / NetBSD
Android
Raspberry Pi
Emscripten / HTML5

satmandu · 2025-02-06T15:26:29Z

Does master need to be merged into this? It looks many commits behind. I'd love to try building off of this branch.

data-man · 2025-02-06T15:48:45Z

Does master need to be merged into this? It looks many commits behind. I'd love to try building off of this branch.

Please try now.

satmandu · 2025-02-06T16:31:29Z

Thanks! This worked great! I was able to directly decode a .flac file of an old family interview without any additional conversion!

@ggerganov Any chance of getting this reviewed?

satmandu · 2025-02-06T16:53:29Z

(FYI I ran this branch on a M1 MBP running the current version of macos.)

ggerganov

Good job and sorry for the delay. We can use this in the examples, but for now I don't plan to add audio decoding directly into whisper.cpp. It seems better to leave this to the user.

Let's resolve the CI and merge. Later on, if we can completely remove SDL dependency, it would be great. But I am not sure how difficult it would be.

examples/common.cpp

data-man · 2025-02-06T20:39:22Z

I hope that I've properly corrected all remarks.

data-man · 2025-02-06T20:41:27Z

Later on, if we can completely remove SDL dependency, it would be great. But I am not sure how difficult it would be.

I think https://github.com/mackron/miniaudio/blob/master/examples/simple_capture.c can be taken as template for audio recording.

examples/common.cpp

…o miniaudio

data-man · 2025-02-26T02:47:07Z

miniaudio 0.11.22 was released, so I've updated this PR.

…gml-org#2759)

mrfragger · 2025-04-09T07:27:04Z

I get why libopus isn't included..read the threads. But still way over my head trying to included it in the souce code for whisper.cpp and custom compile miniaudio with opus support.
Anyway to include opus by default? It's the only audio codec I use for audiobooks.

v0.11.22 - 2025-02-24

In the extras folder, the miniaudio_libvorbis.h and miniaudio_libopus.h files have been deprecated. They have been replaced with versions in the extras/decoders folder. They are now split into a separate .c and .h files. The old files still exist for compatibility, but you need to transition over to the new versions. The transition should be trivial. Compile the .c files like a normal source file, and include the .h file like a normal header.

WilliamTambellini · 2025-05-07T20:44:49Z

cool

WilliamTambellini · 2025-05-07T20:45:08Z

FYI @amanda

This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (ggml-org#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: ggml-org#3092

This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: #3092

data-man force-pushed the miniaudio branch from c8a2caa to 935e8fd Compare January 30, 2025 22:05

ggerganov approved these changes Feb 6, 2025

View reviewed changes

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Outdated Show resolved Hide resolved

data-man force-pushed the miniaudio branch from 6d6836d to 1cc41dd Compare February 6, 2025 20:37

foldl reviewed Feb 7, 2025

View reviewed changes

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Outdated Show resolved Hide resolved

examples/common.cpp Show resolved Hide resolved

data-man force-pushed the miniaudio branch 2 times, most recently from 4d32ac7 to c5f65e1 Compare February 7, 2025 07:48

data-man added 2 commits February 26, 2025 07:21

Use miniaudio for direct decoding flac, mp3, ogg and wav

4508972

Merge branch 'master' of https://github.com/ggerganov/whisper.cpp int…

93b24bb

…o miniaudio

data-man force-pushed the miniaudio branch from c5f65e1 to 93b24bb Compare February 26, 2025 02:42

ggerganov merged commit 7d3da68 into ggml-org:master Feb 27, 2025

ggerganov mentioned this pull request Feb 27, 2025

common : try to fix build #2845

Merged

data-man deleted the miniaudio branch February 27, 2025 07:23

KitaitiMakoto mentioned this pull request Feb 27, 2025

ruby : follow audio library change #2851

Merged

data-man mentioned this pull request Feb 28, 2025

whisper-server: failed to open audio data from fname buffer (Invalid argument) #2847

Closed

joelvaneenwyk pushed a commit to joelvaneenwyk/whisper-cpp that referenced this pull request Mar 3, 2025

examples : use miniaudio for direct decoding flac, mp3, ogg and wav (g…

7a14947

…gml-org#2759)

ggerganov mentioned this pull request May 20, 2025

mtmd : add ultravox audio input ggml-org/llama.cpp#13623

Merged

danbev mentioned this pull request Jun 18, 2025

examples : add stereo to mono conversion in read_audio_data #3266

Merged

Use miniaudio for direct decoding flac, mp3, ogg and wav #2759

Use miniaudio for direct decoding flac, mp3, ogg and wav #2759

Uh oh!

Conversation

data-man commented Jan 23, 2025

Uh oh!

data-man commented Jan 31, 2025

Uh oh!

azkadev commented Feb 5, 2025

Uh oh!

data-man commented Feb 6, 2025

Uh oh!

satmandu commented Feb 6, 2025

Uh oh!

data-man commented Feb 6, 2025

Uh oh!

satmandu commented Feb 6, 2025

Uh oh!

data-man commented Feb 6, 2025

Uh oh!

satmandu commented Feb 6, 2025

Uh oh!

satmandu commented Feb 6, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

data-man commented Feb 6, 2025

Uh oh!

data-man commented Feb 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

data-man commented Feb 26, 2025

Uh oh!

mrfragger commented Apr 9, 2025

Uh oh!

WilliamTambellini commented May 7, 2025

Uh oh!

WilliamTambellini commented May 7, 2025

Uh oh!

Uh oh!