Please add a super resolution model in Audacity #278

Brodvd · 2024-08-30T11:27:39Z

Brodvd
Aug 30, 2024

Hello @RyanMetcalfeInt8 , could you try adding a plugin for super resolution audio using openVINO? There are various models even just on GitHub or demo on Hugging Face, unfortunately with some bugs. @MarcoRavich’s store at the Upscalers section seems to be the best. Thank you! https://github.com/FORARTfe/HyMPS/blob/main/Audio/AI-Enhancing.md#---

RyanMetcalfeInt8 · 2024-08-30T12:31:19Z

RyanMetcalfeInt8
Aug 30, 2024
Maintainer

Hi @Brodvd,

Super resolution has been on my mind for a while as something that I definitely want to add when I get some spare cycles to dedicate to it.

It would be very helpful if you (and others) could give some feedback about which of the projects listed here work the best (from a quality perspective). I've looked at AudioSR before, but I'm not very familiar with the other projects. Anyway, this kind of research is always very time consuming for me.. as I don't want to spend the time porting some models into this framework if they don't appear to work well for general use.

Thanks!

0 replies

Brodvd · 2024-08-30T13:21:20Z

Brodvd
Aug 30, 2024
Author

Ok, also I had problems with Audio SR. however if I find something I’ll tell you here. good job!

0 replies

RyanMetcalfeInt8 · 2024-08-30T13:28:29Z

RyanMetcalfeInt8
Aug 30, 2024
Maintainer

Ok, also I had problems with Audio SR

Just curious.... problems getting it to run? Or problems as in, you weren't impressed by the quality of the result?

1 reply

Brodvd Aug 30, 2024
Author

I had problems installing audio SR at this link: https://github.com/haoheliu/versatile_audio_super_resolution, you managed to install it?

Brodvd · 2024-08-30T17:18:41Z

Brodvd
Aug 30, 2024
Author

however I agree on the quality of the result, the examples in the repository are not very good.

0 replies

MarcoRavich · 2024-08-30T19:45:57Z

MarcoRavich
Aug 30, 2024

Well, even if the most "popular" (since has been adopted by @IAHispano too in their Audio Upscaler for example) is @haoheliu's AudioSR, the most interesting approach (even if has a slightly different scope) seems Audio Delossifier by @kroll-software since is trainable, IMHO.

9 replies

Brodvd Sep 3, 2024
Author

@MarcoRavich sorry but I tried to install and run Audio Delossifier, run audio_predict butI don't understand the output, since the debug isn't have any error. I mistake anything?:

MarcoRavich Sep 3, 2024

@MarcoRavich sorry but I tried to install and run Audio Delossifier, run audio_predict butI don't understand the output, since the debug isn't have any error. I mistake anything?:

I'm a total noob of python so don't know (help of @kroll-software needed).
Anyway I'm playing a bit with ChatGPT and generated this - not yet functional - notebook based on Audio Delossifier code:

# Install necessary dependencies (taken from requirements.txt)
!pip install numpy==1.20.3 --upgrade
!pip install tensorflow==2.9.1 tensorflow-io==0.26.0 --upgrade  --force-reinstall
!pip install scipy==1.7.1 SoundFile==0.10.3 pydub==0.25.1 librosa==0.9.1 scikit-learn==0.24.2 --upgrade

# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
import scipy.io.wavfile as wav
import librosa
import os
import shutil
from google.colab import files
from IPython.display import Audio
import requests
from bs4 import BeautifulSoup

# Helper Functions (audio_functions.py)
def normalize_audio(data):
    return data.astype(np.float32) / np.max(np.abs(data))

def load_and_normalize_audio(file_path):
    sr, data = wav.read(file_path)
    return sr, normalize_audio(data)

def save_audio(file_path, sr, data):
    wav.write(file_path, sr, data.astype(np.int16))

# Prediction Functions (audio_predict.py)
def predict_audio(model, input_file, output_file):
    sr, data = load_and_normalize_audio(input_file)
    data = np.expand_dims(data, axis=0)  # Add batch dimension
    data = np.expand_dims(data, axis=-1)  # Add channel dimension
    prediction = model.predict(data)
    prediction = np.squeeze(prediction)  # Remove batch and channel dimensions
    save_audio(output_file, sr, prediction * np.max(np.abs(data)))

# Training Functions (audio_train.py)
def train_model(train_data, train_labels, val_data, val_labels, epochs=50, batch_size=32):
    model = keras.Sequential([
        keras.layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=(None, 1)),
        keras.layers.MaxPooling1D(pool_size=2),
        keras.layers.Conv1D(128, kernel_size=3, activation='relu'),
        keras.layers.MaxPooling1D(pool_size=2),
        keras.layers.Conv1D(256, kernel_size=3, activation='relu'),
        keras.layers.MaxPooling1D(pool_size=2),
        keras.layers.Flatten(),
        keras.layers.Dense(256, activation='relu'),
        keras.layers.Dense(1)
    ])
    
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    model.fit(train_data, train_labels, epochs=epochs, batch_size=batch_size, validation_data=(val_data, val_labels))
    return model

def prepare_training_data(files):
    train_data, train_labels = [], []
    for file in files:
        sr, data = load_and_normalize_audio(file)
        train_data.append(np.expand_dims(data, axis=-1))
        train_labels.append(data)  # Dummy labels for the sake of example
    return np.array(train_data), np.array(train_labels)

# Start Menu
def start_menu():
    print("Choose an option:")
    print("1. Perform inference")
    print("2. Train a new model")
    choice = input("Enter 1 or 2: ")
    return choice

# Model Scraping
def get_model_filenames():
    url = "https://github.com/kroll-software/AudioDelossifier/tree/master/models"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    models = set()
    for link in soup.find_all('a', href=True):
        if link['href'].endswith('.h5'):
            model_name = link['href'].split('/')[-1]
            models.add(model_name)
    return list(models)

# Load Pre-trained Model
def load_model_from_path(model_path):
    try:
        model = keras.models.load_model(model_path)
        print(f"Loaded model: {model_path}")
        return model
    except Exception as e:
        raise Exception(f"Error loading model: {str(e)}")

# Main Code
choice = start_menu()

if choice == "1":
    # Inference Mode
    models = get_model_filenames()

    if not models:
        raise Exception("No models found at the specified URL.")

    print("Available models:")
    for i, model in enumerate(models, 1):
        print(f"{i}. {model}")
    
    print(f"{len(models) + 1}. Upload your own model")
    selected_option = int(input(f"Select the model number (1-{len(models) + 1}): "))
    
    if selected_option == len(models) + 1:
        uploaded_model = files.upload()
        MODEL_PATH = list(uploaded_model.keys())[0]
    else:
        selected_model_name = models[selected_option - 1]
        MODEL_URL = f"https://github.com/kroll-software/AudioDelossifier/raw/master/models/{selected_model_name}"
        MODEL_PATH = selected_model_name
        
        def download_model(url, path):
            response = requests.get(url, stream=True)
            if response.status_code == 200:
                with open(path, 'wb') as file:
                    file.write(response.content)
                print(f"Downloaded model: {path}")
            else:
                raise Exception(f"Failed to download the model. Status code: {response.status_code}")
        
        download_model(MODEL_URL, MODEL_PATH)

    model = load_model_from_path(MODEL_PATH)
    uploaded_audio = files.upload()
    for filename in uploaded_audio.keys():
        input_file = os.path.join('audio', filename)
        output_file = os.path.join('audio/out', 'processed_' + filename)
        predict_audio(model, input_file, output_file)
    
    output_files = os.listdir('audio/out')
    print(f"Processed files: {output_files}")
    
    if output_files:
        display(Audio(filename=os.path.join('audio/out', output_files[0]), autoplay=True))
    else:
        print("No output files found.")
    
elif choice == "2":
    # Training Mode
    uploaded_dataset = files.upload()
    train_files = [os.path.join('audio', fname) for fname in uploaded_dataset.keys()]
    train_data, train_labels = prepare_training_data(train_files)
    
    val_split = int(0.8 * len(train_data))
    train_data, val_data = train_data[:val_split], train_data[val_split:]
    train_labels, val_labels = train_labels[:val_split], train_labels[val_split:]
    
    model = train_model(train_data, train_labels, val_data, val_labels)
    model_save_path = "trained_model.h5"
    model.save(model_save_path)
    print(f"Model saved at {model_save_path}")
else:
    print("Invalid choice. Please restart the notebook and select a valid option.")

Some skilled dev needed to "fix/update" the code...

...maybe @bkraad47 - fat_llama author - or @ivandustin - pyaudioenhance one - can help.

Brodvd Sep 3, 2024
Author

Ok, thanks!

bkraad47 Sep 5, 2024

@MarcoRavich theres a multi threaded approach that yields slightly better results than cpu only. - https://github.com/bkraad47/fat_llama_fftw

MarcoRavich Sep 5, 2024

@MarcoRavich theres a multi threaded approach that yields slightly better results than cpu only. - https://github.com/bkraad47/fat_llama_fftw

Very interesting thanks, but - if I'd understood well - must be supported in tensorflow itself... am I wrong ?

I'll investigate further, meanwhile I strongly suggest you to check @chomeyama's Dual-CycleGAN that implemented cool optimizations too.

RyanMetcalfeInt8 · 2024-09-03T13:48:50Z

RyanMetcalfeInt8
Sep 3, 2024
Maintainer

Hey guys, I see in the AudioSR README (https://github.com/haoheliu/versatile_audio_super_resolution?tab=readme-ov-file#readme), there is a 'Demo & Cloud API' link. Seems like it's possible to use it via this Web UI front end... (although I didn't try it).

Perhaps this is easier to try it out on your side?

If you guys can help prove that this feature works well, it would be super helpful! As I said before, this part is super time consuming for me.. as I don't want to make the mistake of spending a lot of time porting some algorithm / model that only works well for a few specific samples (useless in the general case).

Edit: Sorry, I missed the 'Add a payment method to run this model' link on that replicate page. Nevermind about trying to run it there.

5 replies

MarcoRavich Sep 3, 2024

Hey guys, I see in the AudioSR README (https://github.com/haoheliu/versatile_audio_super_resolution?tab=readme-ov-file#readme), there is a 'Demo & Cloud API' link. Seems like it's possible to use it via this Web UI front end... (although I didn't try it).

Well, if you wanna test AudioSR is better to go with Audio Upscaler (which derives from it) instead.

If you guys can help prove that this feature works well, it would be super helpful! As I said before, this part is super time consuming for me.. as I don't want to make the mistake of spending a lot of time porting some algorithm / model that only works well for a few specific samples (useless in the general case).

Totally agree, that's why I strongly believe that a trainable one is the best choice for all: the final output quality will be determined by users themselves with their own personalized training (for example I don't have a strict need to "upscale" but more to de-clip/distort my cameras audio which I could get anyway by training the neural network for my personal needs).

This could also stimulate a virtuous competition/exchange - and, why not, market - of models like in the so called "NAM revolution"...

RyanMetcalfeInt8 Sep 3, 2024
Maintainer

Hey guys, I see in the AudioSR README (https://github.com/haoheliu/versatile_audio_super_resolution?tab=readme-ov-file#readme), there is a 'Demo & Cloud API' link. Seems like it's possible to use it via this Web UI front end... (although I didn't try it).

Well, if you wanna test AudioSR is better to go with Audio Upscaler (which derives from it) instead.

Interesting. Just curious, is it clear to you what makes Audio Upscaler the better one to go with? I see in the repo README that it's 'based on the work of' AudioSR, but it's not exactly clear what it means... did they tweak AudioSR models / pipelines to improve quality?

Brodvd Sep 3, 2024
Author

@MarcoRavich I also agree with this, especially a revised training model and accessible to all, only that this can be done with professional users, but who uses only audacity still need a pre-trained AI model to use directly on audacity, it may be the same model as the one to be trained.

MarcoRavich Sep 3, 2024

@MarcoRavich I also agree with this, especially a revised training model and accessible to all, only that this can be done with professional users, but who uses only audacity still need a pre-trained AI model to use directly on audacity, it may be the same model as the one to be trained.

You can use the "predefined" included model then.
But if you - like me - need something more fine tuned to your needs you can made your own.

And you can share your models with other (check ToneHunt to get an idea) too: that's the "revolution".

MarcoRavich Sep 3, 2024

Interesting. Just curious, is it clear to you what makes Audio Upscaler the better one to go with? I see in the repo README that it's 'based on the work of' AudioSR, but it's not exactly clear what it means... did they tweak AudioSR models / pipelines to improve quality?

Well, sincerly I don't know but - since Applio is an RVC - is probably tuned for voice.

Anyway @aitronz or @blaisewf can certainly explain better than me.

MarcoRavich · 2024-09-04T18:58:17Z

MarcoRavich
Sep 4, 2024

Bump.

Seems that @JacobLinCool setuped an AudioSR @ HF and @nateraw has another @ replicate too.

Hope they can help.

5 replies

Brodvd Sep 4, 2024
Author

The First (on Huggin Face) I have already tried, It funtion and the output Is quite good, but I think like the originale project.

JacobLinCool Sep 4, 2024

Hi, the credit should go to https://github.com/haoheliu/versatile_audio_super_resolution. I only made a Hugging Face Space for it.
Something I have to mention: the model is very memory-consuming and somewhat slow.
I just upgraded the space to ZeroGPU. Hopefully, this will help you guys run it faster and do more tests. 😉

RyanMetcalfeInt8 Sep 4, 2024
Maintainer

Hi, the credit should go to https://github.com/haoheliu/versatile_audio_super_resolution. I only made a Hugging Face Space for it. Something I have to mention: the model is very memory-consuming and somewhat slow. I just upgraded the space to ZeroGPU. Hopefully, this will help you guys run it faster and do more tests. 😉

Awesome, thank you @JacobLinCool !

MarcoRavich Sep 5, 2024

Hi, the credit should go to https://github.com/haoheliu/versatile_audio_super_resolution. I only made a Hugging Face Space for it. Something I have to mention: the model is very memory-consuming and somewhat slow. I just upgraded the space to ZeroGPU. Hopefully, this will help you guys run it faster and do more tests. 😉

1st of all, thank for reply.
2nd thanks for your HF Space thal allows anyone test AudioSR.

Just a question: have you planned to add more upscaling models ? The GUI seems predisposed already for this...
...if so, check my resources collection @ HyMPS.

Last one: can you "port" a trainable one ?

Thanks again for what you do.

nateraw Sep 9, 2024

Hey there, chiming in that the model posted to replicate under my username is also just a wrapper around https://github.com/haoheliu/versatile_audio_super_resolution 😄 all credits to the authors there for the models themselves :).

Code is open source in that repo: https://github.com/haoheliu/versatile_audio_super_resolution/blob/main/predict.py.

Brodvd · 2024-09-09T09:34:25Z

Brodvd
Sep 9, 2024
Author

@nateraw absolutly. @MarcoRavich I did a simple test of Audio SR at this link https://huggingface.co/spaces/JacobLinCool/audio-super-resolution , this is what it did:

as described in the screen is a track with violins and in the second part the addition of percussion. For the violins Audio SR did a good job but didn’t understand the preponderance of percussion, this may be a borderline case but (for the knowledge I have in the field of super resolution) add a CNN that gets detailed track information and then passes the data to the latent spread of Audio SR could be a good idea?

4 replies

MarcoRavich Sep 9, 2024

Well, to my understanding AudioSR isn't to recover "damaged" audio, but to just to upscale...

...anyway, as said, I believe that the best choice would be to have a trainable one.

Brodvd Sep 9, 2024
Author

Of course

RyanMetcalfeInt8 Sep 9, 2024
Maintainer

@MarcoRavich -- Just to point out -- typically this project relies on pre-trained models. We take care of converting model from native format (like pytorch) to OpenVINO, but we typically don't retrain or tune the model.

MarcoRavich Sep 10, 2024

@MarcoRavich -- Just to point out -- typically this project relies on pre-trained models. We take care of converting model from native format (like pytorch) to OpenVINO, but we typically don't retrain or tune the model.

Of course I got it, @RyanMetcalfeInt8.

Anyway you have to consider that most popular open source "AI" applications owe their success mostly to their training abilities - mind you, not the obligation - of their neural networks and the consequent possibility of exchanging custom profiles between users.
I don't think that it's so mutch difficult to provide a plugin setuped to use a "default profile" but at the same time capable to load different ones.

In conclusion, as already typed above, my "vote" goes to AudioSR upscaler (since is the most diffused), but I strongly encourage you to evaluate - perhaps starting from NAM, as suggested in the other discussion - to favor models learning/loading-capable applications in the future.

Thanks again.

Brodvd · 2024-09-21T20:12:17Z

Brodvd
Sep 21, 2024
Author

Hey @MarcoRavich you know other GitHub Projects that use Deep learning(Wave-Unet, GAN...) for re-create the High-frequences of a audio track that aren't in your archive?

2 replies

MarcoRavich Sep 22, 2024

Hey @MarcoRavich you know other GitHub Projects that use Deep learning(Wave-Unet, GAN...) for re-create the High-frequences of a audio track that aren't in your archive?

The HyMPS project's collection is constantly updated, so everything I find is listed there.
I've recently reorganized those resources into a dedicated Enhancing page and added some more. Check it out.

Brodvd Sep 22, 2024
Author

Thanks :)

Brodvd · 2024-09-22T12:04:05Z

Brodvd
Sep 22, 2024
Author

However @RyanMetcalfeInt8 if you want an advice if you need to add a super resolution audio plugin to the openVINO suite as soon as possible you can temporarily include Audio SR.

6 replies

MarcoRavich Sep 24, 2024

Once it's released it would be interesting to subject it to the "null test"...

I wanna invite @Brodvd to join the "battlefield" @ HA for a constructive discussion - with audio experts - on the subject.

Brodvd Sep 24, 2024
Author

@MarcoRavich I see this discussion time ago, what have you conclused for now?

MarcoRavich Sep 24, 2024

@MarcoRavich I see this discussion time ago, what have you conclused for now?

Well, that the null-test is needed to scientifically evaluate the upscaling quality...

RyanMetcalfeInt8 Sep 24, 2024
Maintainer

I'd be interested in understanding more about what the 'null-test' is.. but we don't need to necessarily wait until Audio SR is released (as part of these plugins). As the goal of this port is to be a faithful port of AudioSR, it can technically be evaluated using the HF space: AudioSR @ HF.

MarcoRavich Sep 25, 2024

I'd be interested in understanding more about what the 'null-test' is..

The NULL-test is a very simple and accurate method to "extract" all differences between a treated audio signal and the original one, by playing - or rendering - them together but inverting the phase of one of the two by 180°.

There are of course dedicated (free) software solutions for in-depth NULL analysis such as DeltaWave Audio Null Comparator, but many modern audio plugins integrates a simple "hear differences" function nowadays...

...so it would be cool to have something like @franciscorafart's dsp-audio-null-test as OpenVINO's internal function too.

but we don't need to necessarily wait until Audio SR is released (as part of these plugins). As the goal of this port is to be a faithful port of AudioSR, it can technically be evaluated using the HF space: AudioSR @ HF.

Unfortunally it don't infer (<60 sec audio) for me...

Anyway, even if AudioSR is certainly the way to go for OpenVINO, @JacobLinCool's implementation doesn't allow arbitrary target sample rates (due to HF limitations I believe), even if - in "classical" mode - integer like x2, x4, x8 are always recommended.

It would be interesting to understand how @haoheliu's neural network was trained, since AudioSR paper claims that it "can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz" (so not that "super" resolution IMHO).

RyanMetcalfeInt8 · 2024-12-19T13:43:51Z

RyanMetcalfeInt8
Dec 19, 2024
Maintainer

Hi all, just a heads up that we've integrated a super resolution feature into the main branch of this project. Next release (coming next few days) will include it.

0 replies

RyanMetcalfeInt8 · 2024-12-19T13:44:23Z

RyanMetcalfeInt8
Dec 19, 2024
Maintainer

https://github.com/intel/openvino-plugins-ai-audacity/blob/main/doc/feature_doc/super_resolution/README.md

0 replies

Please add a super resolution model in Audacity #278

Uh oh!

Replies: 12 comments · 32 replies

Uh oh!

RyanMetcalfeInt8 Aug 30, 2024 Maintainer

Uh oh!

Brodvd Aug 30, 2024 Author

Uh oh!

RyanMetcalfeInt8 Aug 30, 2024 Maintainer

Uh oh!

Brodvd Aug 30, 2024 Author

Uh oh!

Brodvd Aug 30, 2024 Author

Uh oh!

Uh oh!

Brodvd Sep 3, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Brodvd Sep 3, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RyanMetcalfeInt8 Sep 3, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

RyanMetcalfeInt8 Sep 3, 2024 Maintainer

Uh oh!

Uh oh!

Brodvd Sep 3, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Brodvd Sep 4, 2024 Author

Uh oh!

Uh oh!

RyanMetcalfeInt8 Sep 4, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Brodvd Sep 9, 2024 Author

Uh oh!

Uh oh!

Brodvd Sep 9, 2024 Author

Uh oh!

RyanMetcalfeInt8 Sep 9, 2024 Maintainer

Uh oh!

Replies: 12 comments 32 replies

RyanMetcalfeInt8
Aug 30, 2024
Maintainer

Brodvd
Aug 30, 2024
Author

RyanMetcalfeInt8
Aug 30, 2024
Maintainer

Brodvd Aug 30, 2024
Author

Brodvd
Aug 30, 2024
Author

Brodvd Sep 3, 2024
Author

Brodvd Sep 3, 2024
Author

RyanMetcalfeInt8
Sep 3, 2024
Maintainer

RyanMetcalfeInt8 Sep 3, 2024
Maintainer

Brodvd Sep 3, 2024
Author

Brodvd Sep 4, 2024
Author

RyanMetcalfeInt8 Sep 4, 2024
Maintainer

Brodvd
Sep 9, 2024
Author

Brodvd Sep 9, 2024
Author

RyanMetcalfeInt8 Sep 9, 2024
Maintainer