Explain --format mp3 #49
-
how do that MP3 format trick, and it’s not a rename trick or an external tool. I'm not an expert by any means, though, as far as I understand:
update: The question is answered, Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hey @atefgithub, great question! I totally understand why you'd ask, as But yes, the script does produce actual, properly encoded MP3 files, not just renamed ones! Here's how it works:
The # Parse format option
for i, arg in enumerate(sys.argv):
elif arg == '--format' and i + 1 < len(sys.argv):
format = sys.argv[i + 1].lower()
if format not in ['wav', 'mp3']:
print("Error: Format must be either 'wav' or 'mp3'")
sys.exit(1) ❯ uv run ./kokoro-tts test.txt --format mp3 test.mp3 --speed 1 --lang en-us --voice af_nicole
Processing: Chapter 1
Completed Chapter 1: 1/1 chunks processed
Saving complete audio file...
Created test.mp3
❯ ffprobe test.mp3
ffprobe version n7.1.1 Copyright (c) 2007-2025 the FFmpeg developers
built with gcc 14.2.1 (GCC) 20250207
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-frei0r --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libdvdnav --enable-libdvdread --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgsm --enable-libharfbuzz --enable-libiec61883 --enable-libjack --enable-libjxl --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librav1e --enable-librsvg --enable-librubberband --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpl --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-vapoursynth --enable-version3 --enable-vulkan
libavutil 59. 39.100 / 59. 39.100
libavcodec 61. 19.101 / 61. 19.101
libavformat 61. 7.100 / 61. 7.100
libavdevice 61. 3.100 / 61. 3.100
libavfilter 10. 4.100 / 10. 4.100
libswscale 8. 3.100 / 8. 3.100
libswresample 5. 3.100 / 5. 3.100
libpostproc 58. 3.100 / 58. 3.100
Input #0, mp3, from 'test.mp3':
Duration: 00:01:26.09, start: 0.046042, bitrate: 56 kb/s
Stream #0:0: Audio: mp3 (mp3float), 24000 Hz, mono, fltp, 56 kb/s
Metadata:
encoder : LAME3.100
The line For this to work, the main requirement is having So, no external conversion step is needed in the script itself – it leverages the capabilities of the underlying libraries! Hope that clears things up! Let me know if you are still confused. |
Beta Was this translation helpful? Give feedback.
Hey @atefgithub, great question! I totally understand why you'd ask, as
sf.write
itself doesn't natively encode formats like MP3.But yes, the script does produce actual, properly encoded MP3 files, not just renamed ones!
Here's how it works:
soundfile
library, imported assf
, is actually a wrapper around the powerful C library calledlibsndfile
.libsndfile
is smart. While it handles many formats itself, it can also delegate encoding/decoding for certain formats (like MP3) to other libraries installed on your system.sf.write(output_file, ...)
andoutput_file
ends with.mp3
,libsndfile
sees…