[REQ] Delossifier function #356

MarcoRavich · 2024-12-30T10:09:48Z

MarcoRavich
Dec 30, 2024

Hi there,
after the recent cool super resolution feature add, it would be great to have a tool able to "delossify" - as far as possible - compressed audio.

@kroll-software has recently updated their Audio Delossifier and a brunch of pre-trained models (for mp3 delossification) are provided too.

Last but not least, I've just created a simple script to inference on Colab: kroll-software/AudioDelossifier#5 (comment)

Hope that inspires !

RyanMetcalfeInt8 · 2025-01-01T13:36:40Z

RyanMetcalfeInt8
Jan 1, 2025
Maintainer

Thanks for the idea. I also see some MP3 restoration models listed here: https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/docs/pretrained_models.md#single-stem-models

As always, it would be very helpful to get some help from the community to evaluate & validate quality of the projects & available pre-trained models.

0 replies

MarcoRavich · 2025-01-02T07:40:33Z

MarcoRavich
Jan 2, 2025
Author

Thanks for the hint and happy new year, @RyanMetcalfeInt8 !

Just added Apollo by @JusperLee under AUDIO \ AI-based \ Restorers:

Apollo consistently outperforms existing SR-GAN models across various bit rates and music genres, particularly excelling in complex scenarios involving mixtures of multiple instruments and vocals. Apollo significantly improves music restoration quality while maintaining computational efficiency.

Anyway I honestly didn't understand exactly how it can work since it relies on music separation nnet, even if:

During data preprocessing, we drew inspiration from music separation techniques and implemented the following steps:

Source Activity Detection (SAD):
We used a Source Activity Detector (SAD) to remove silent regions from the audio tracks, retaining only the significant portions for training.

Data Augmentation:
We performed real-time data augmentation by mixing tracks from different songs. For each mix, we randomly selected between 1 and 8 stems from the 11 available tracks, extracting 3-second clips from each selected stem. These clips were scaled in energy by a random factor within the range of [-10, 10] dB relative to their original levels. The selected clips were then summed together to create simulated mixed music.

Simulating Dynamic Bitrate Compression:
We simulated various bitrate scenarios by applying MP3 codecs with bitrates of [24000, 32000, 48000, 64000, 96000, 128000].

Rescaling:
To ensure consistency across all samples, we rescaled both the target and the encoded audio based on their maximum absolute values.

Saving as HDF5:
After preprocessing, all data (including the source stems, mixed tracks, and compressed audio) was saved in HDF5 format, making it easy to load for training and evaluation purposes.

Hope that active nnet-based "audio restoration" projects' devs (such as @asjad895, @AakashRevankar, @matthewmcq, @shaws34, @bkraad47 and @kroll-software of course) will join this discussion too.

If can help I'll try to engage Hydrogenaudio's experts again for qualitative evaluation of restorers' outputs.

1 reply

JusperLee Feb 5, 2025

Thank you for your attention, our publicly available pre-trained model for this method was trained using only open source data. If you guys train with more data, you are able to get better generalization performance.

Brodvd · 2025-01-02T11:17:40Z

Brodvd
Jan 2, 2025

If want to try fast: https://huggingface.co/spaces/patriotyk/Apollo
(there's also the checkpoint "apollo-universal" is more or less Audio SR but not the same)

0 replies

MarcoRavich · 2025-01-02T15:46:26Z

MarcoRavich
Jan 2, 2025
Author

@yoongi43 - the Exploiting Time-Frequency Conformers for Music Audio Enhancement author - just noticed about this discussion.

0 replies

MarcoRavich · 2025-02-05T09:37:18Z

MarcoRavich
Feb 5, 2025
Author

Just discovered this interesting Audio Super Resolution implementation by @jakeoneijk:
FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation

Abstract from the relative paper:

Versatile audio super-resolution (SR) is the challenging task of restoring high-frequency components from low-resolution audio with sampling rates between 4kHz and 32kHz in various domains such as music, speech, and sound effects. Previous diffusion-based SR methods suffer from slow inference due to the need for a large number of sampling steps. In this paper, we introduce FlashSR, a single-step diffusion model for versatile audio super-resolution aimed at producing 48kHz audio. FlashSR achieves fast inference by utilizing diffusion distillation with three objectives: distillation loss, adversarial loss, and distribution-matching distillation loss. We further enhance performance by proposing the SR Vocoder, which is specifically designed for SR models operating on mel-spectrograms. FlashSR demonstrates competitive performance with the current state-of-the-art model in both objective and subjective evaluations while being approximately 22 times faster.

Hope that inspires !

2 replies

Brodvd Feb 5, 2025

Well, listening the examples FlashSR seems better of Audio SR about quality.

RyanMetcalfeInt8 Feb 6, 2025
Maintainer

Very interesting!

kroll-software · 2025-02-06T00:39:42Z

kroll-software
Feb 6, 2025

I think, you all doing a great job pushing this idea further.

0 replies

Neovolter · 2025-02-12T19:07:40Z

Neovolter
Feb 12, 2025

Have someone objectively compared quality of these models (AudioSR, Apollo, FlashSR and maybe something else)? I like both Apollo and FlashSR in terms of quality)

1 reply

MarcoRavich Apr 23, 2025
Author

hydrogenaudio's listening test is needed for, IMHO.

MarcoRavich · 2025-04-23T08:47:17Z

MarcoRavich
Apr 23, 2025
Author

Bump.

Seems that @JusperLee's Apollo is gaining results (and needs optimization on Intel-*PUs)...

0 replies

matthewmcq · 2025-04-26T23:33:49Z

matthewmcq
Apr 26, 2025

Hi all!

I apologize for not contributing much to the thread until now, but I just wrapped up my thesis which I feel has some fairly profound implications for audio restoration. Unfortunately, I did not get the change to directly integrate any of these algorithms into a DL pipeline, but I feel y'all are the type of people who would know what to do with it better than me. TLDR is I developed an optimized algorithm to convert from a DFT spectrum into the set of constituent "true" continuous components with more or less arbitrarily good accuracy. In my thesis I used it to "clean" an STFT and for a pretty novel resampling approach.

Code is available here. If you're at all curious to read the actual thesis to get a better idea of the theoretical foundations or to see the results visualized, the pdf is in the folder titled "thesis."

Have a great weekend!

0 replies

MarcoRavich · 2025-06-11T08:56:34Z

MarcoRavich
Jun 11, 2025
Author

Bump.

Just discovered @rajasekarnp1's / @31jay's Neural Audio Upscaler project: models aren't present (but the Project Summary claims "Voice, Music, Ambient and General") and I don't exactly understand inference platform they relies on, but the approach seems interesting.

0 replies

MarcoRavich · 2025-06-12T11:14:48Z

MarcoRavich
Jun 12, 2025
Author

Ok, with the help of GH-Copilot I've finally "ported" the latest @kroll-software's AudioDelossifier (and, of course, converted provided models) to run - locally - exploiting OpenVINO.

https://github.com/MarcoRavich/OV-AudioDelossifier#readme

It's still pretty raw of course, but it seems to infere without errors.

I'm waiting for your feedbacks and suggestions to improve it further.

7 replies

Brodvd Jun 12, 2025

Strange, I downloaded your fork and installed the requirements.txt, now with your latest commit I get this:

Available devices for inference:
1: CPU
2: GPU
3: AUTO (Automatic Device Selection)
Choose the device to run inference on (number):
1
Enter the path to the models directory (default: ./models/OpenVINO):
models/OpenVINO
Enter the path to the audio file you want to infer on:
audio/64k.mp3
Detected lossy input bitrate: None kbps
Is this correct? Enter different kbps value or press Enter to confirm:
64
Using model: models/OpenVINO\WaveletModel-64k-2048.weights.xml
Using device: CPU
Processing audio/64k.mp3 ...
Traceback (most recent call last):
  File "C:\Users\User\Downloads\OV-AudioDelossifier-master\OV-AudioDelossifier-master\Deloss.py", line 189, in <module>
    main()
  File "C:\Users\User\Downloads\OV-AudioDelossifier-master\OV-AudioDelossifier-master\Deloss.py", line 170, in main
    inputs = preprocess_audio(audio_file)
  File "C:\Users\User\Downloads\OV-AudioDelossifier-master\OV-AudioDelossifier-master\Deloss.py", line 119, in preprocess_audio
    data, _ = af.read_audiofile(file)
ValueError: too many values to unpack (expected 2)

Can you check that the one you use locally is the same as the fork?

MarcoRavich Jun 12, 2025
Author

Remember that \ matters...

Brodvd Jun 13, 2025

Ok but it's the same thing, I corrected the error like this:

def read_audiofile(filename, target_sr=44100, channels=2):
    # ...
    data, samplerate = sf.read(filename, dtype='float32', always_2d=True)
    # ...
    return data, samplerate

and deleted dtype=np.float32

af.write_audiofile_wav(output_audio, out_fname) #dtype=np.float32

so:

Available devices for inference:
1: CPU
2: GPU
3: AUTO (Automatic Device Selection)
Choose the device to run inference on (number):
1
Enter the path to the models directory (default: ./models/OpenVINO):
models\OpenVINO
Enter the path to the audio file you want to infer on:
audio\128k.mp3
Detected lossy input bitrate: None kbps
Is this correct? Enter different kbps value or press Enter to confirm:
128
Using model: models\OpenVINO\WaveletModel-128k-2048.weights.xml
Using device: CPU
Processing audio\128k.mp3 ...
Inference complete. Output saved as audio\128k_deloss.wav (32bit float PCM WAV)
Inference time: 92.09 seconds

but:

MarcoRavich Jun 13, 2025
Author

32bit fp audio output - using dtype=np.float32 - works for me:

Anyway, as claimed above, you have to generate a null mix (in Audacity select one track and perform Effect -> Special -> Invert, then select both and perform Tracks -> Mix -> Mix and Render to New Track) to be able to get the differences between lossy source and delossifyed result:

Brodvd Jun 13, 2025

Okok, sorry for my misunderstanding :)

kroll-software · 2025-06-13T13:51:55Z

kroll-software
Jun 13, 2025

I really appreciate all your efforts and progress.

Some ideas for improvements:

Better training data could lead to better results
A 'sliding window' during inference could improve the algorithm.
see audio_predict.py, from line 36, # process data in chunks

Detlef

0 replies

MarcoRavich · 2025-06-26T14:38:16Z

MarcoRavich
Jun 26, 2025
Author

Bump.

Thanks to @jarredou's Apollo-Colab-Inference repo I've discovered the Universal model for any lossy files by Lew bringed by @deton24.

I've failed to convert it to ONNX format, anyway I'm trying to inference with it as is.

Any suggestion @RyanMetcalfeInt8 ?

28 replies

RyanMetcalfeInt8 Jul 15, 2025
Maintainer

A License column - for better "at glance" info for users - in the new Model Manager could be useful then, according to @jarredou words...
...another request for @RyanMetcalfeInt8 : is it possible to add something like "own" option to load non-listed models too ?

Could be nice but just being credited aside the model like on screenshot is already great, maybe coupled to a link to model trainer's repo. No need to clutter the UI just because of me and my bad experience with Steinberg 😃

Yeah, I can share more details as it gets further baked, but imagine that the Model Info button in that screenshot opens up a model card (probably will just change it to Model Card actually :) ), and in that is various info like license, link to the source repo (model trainer's repo), and some information about what the model is intended to be used for.

jarredou Jul 15, 2025

perfect ! :)

jarredou Jul 15, 2025

Another thing about my drumsep model, as it was trained only with the kick/snare/toms/hh/cymbals stems, it could be great to add a "residuals/other" stem to the results, obtained by phase inverting all separated stems against the input, so you get an resulting audio file with everything that was left aside by the model (it can be useful when the input has some percussions not fitting in the model stem types or to allow users to process these residuals further, and making this also implies that the separated stems +residuals when mixed together give perfect reconstruction of the input. That can be really important in some use cases)

MarcoRavich Jul 16, 2025
Author

We're a bit OT (sorry) but thanks for the infos !

I know AudioSR trained on 2kHz-16kHz lumped together in the training data (paper link). In theory, most of the work of the delossifier is just increasing the bitrate and coming up with the missing frequencies above the lossy nyquist limit. Like its not doing anything fundamentally different between lossy formats from a spectral (STFT/wavelet) perspective, which are the pathways that all these models use.

I've also readed Control Flow Outline on your upscalemp3 repo (note: can you upload your models on Huggingface to let this plugin exploit them too ?) that is very interesting !

I know that almost all lossy codecs are based on (essentially) same operating principles but I wonder how results would be affected if the trainer is feeded with lossy audio converted using obsolete encoders (which were not optimized like actual ones).

Point being that maybe instead of a codec, a curriculum approach would be better. (I.e. train on 16kHz->48 then 12.5, 6, etc.).

I'm not that technically experienced on the subject so I honestly couldn't figure how a "curriculum approach" could be implemented here but, observing how (some) other neural network-based "audio reconstruction" projects works, I've noticed that specific reference signals are feeded into the trainer to obtain better - ESR - results (e.g. @sdatkinson's nam)...
...I also wonder if it's possible (and desiderable) to "bias" the training using nulltests as feedbacks.

I'm not sure what the tradeoff bt loss generalizability vs specific types works, but I do think a one size fits all model might just make more sense from a practicality/efficiency standpoint.

I'm a bit torn on this: while I understand that a generic model is certainly more practical, I'm fairly certain that who needs a delossificator is more interested in the results fidelity (not-incidentally used term).

Thanks again for your work and kindness !

RyanMetcalfeInt8 Jul 27, 2025
Maintainer

Another thing about my drumsep model, as it was trained only with the kick/snare/toms/hh/cymbals stems, it could be great to add a "residuals/other" stem to the results, obtained by phase inverting all separated stems against the input, so you get an resulting audio file with everything that was left aside by the model (it can be useful when the input has some percussions not fitting in the model stem types or to allow users to process these residuals further, and making this also implies that the separated stems +residuals when mixed together give perfect reconstruction of the input. That can be really important in some use cases)

Sorry for delayed response! Sure thing, I can add that -- so essentially this, right? residual = original_audio - (sum of all drum stems)

Neovolter · 2025-06-26T15:09:40Z

Neovolter
Jun 26, 2025

Sorry for maybe stupid questions but is it possible to use this on a Nvidia gpu (on a cpu it's soo long) and is there any better delossifier in terms of quality ?

2 replies

RyanMetcalfeInt8 Jun 26, 2025
Maintainer

You're asking about our AI plugins in general? If so, then unfortunately not, OpenVINO doesn't have a 'backed' that supports NVidia (CUDA).

MarcoRavich Jun 26, 2025
Author

Sorry for maybe stupid questions but is it possible to use this on a Nvidia gpu (on a cpu it's soo long)

Ehm, did you notice we're into @intel repositories ?
Check source repos instead...

and is there any better delossifier in terms of quality ?

Well Apollo maybe, but not for OpenVINO at the moment.

Anyway, if you own an NVIDIA GPU, why don't you train models yourself (and share please) ?!?

deton24 · 2025-07-08T01:02:08Z

deton24
Jul 8, 2025

I'm sorry, but no. I'll write back if they come up. pon., 7 lip 2025, 17:21 użytkownik Ryan Metcalfe ***@***.***> napisał:

…

Hi @deton24 <https://github.com/deton24> -- Any chance you heard back from Lew? I've been occasionally checking your GDoc (which is amazing btw), but didn't see any info there. I guess chances are, you didn't hear anything -- but I figured I'd check :) I just really want to release this model with our plugins, as it's pretty amazing.. — Reply to this email directly, view it on GitHub <#356 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIJ3EHGBPMKOMM5I23NNJXT3HKGBDAVCNFSM6AAAAABUPP3MJCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNRYGU2TGOI> . You are receiving this because you were mentioned.Message ID: <intel/openvino-plugins-ai-audacity/repo-discussions/356/comments/13685539 @github.com>

0 replies

[REQ] Delossifier function #356

Uh oh!

Replies: 15 comments · 41 replies

Uh oh!

RyanMetcalfeInt8 Jan 1, 2025 Maintainer

Uh oh!

Uh oh!

MarcoRavich Jan 2, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Jan 2, 2025 Author

Uh oh!

MarcoRavich Feb 5, 2025 Author

Uh oh!

Uh oh!

RyanMetcalfeInt8 Feb 6, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Apr 23, 2025 Author

Uh oh!

MarcoRavich Apr 23, 2025 Author

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Jun 11, 2025 Author

Uh oh!

Uh oh!

MarcoRavich Jun 12, 2025 Author

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Jun 12, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Jun 13, 2025 Author

Uh oh!

Uh oh!

Uh oh!

MarcoRavich Jun 26, 2025 Author

Uh oh!

Uh oh!

RyanMetcalfeInt8 Jul 15, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Replies: 15 comments 41 replies

RyanMetcalfeInt8
Jan 1, 2025
Maintainer

MarcoRavich
Jan 2, 2025
Author

MarcoRavich
Jan 2, 2025
Author

MarcoRavich
Feb 5, 2025
Author

RyanMetcalfeInt8 Feb 6, 2025
Maintainer

MarcoRavich Apr 23, 2025
Author

MarcoRavich
Apr 23, 2025
Author

MarcoRavich
Jun 11, 2025
Author

MarcoRavich
Jun 12, 2025
Author

MarcoRavich Jun 12, 2025
Author

MarcoRavich Jun 13, 2025
Author

MarcoRavich
Jun 26, 2025
Author

RyanMetcalfeInt8 Jul 15, 2025
Maintainer