Would you please share some training details for stabilityai/stable-audio-open-small?

I'm trying to replicate the diffusion training process using benjamin-paine/freesound-laion-640k and benjamin-paine/free-music-archive-large.

Currently I use the same small network  defined by stabilityai/stable-audio-open-small which is 16 layers with 1024 hidden size. I trained 3 epoches for now and continued training. But the demo reconstruction is not that good. 
The diffusion mse loss is shake around 0.85.

<img width="817" alt="Image" src="https://github.com/user-attachments/assets/8598829b-7b1a-4d7a-a756-60c58aa8993d" />

The cfg-7 conditioned audio mel spectrum looks like:

<img width="448" alt="Image" src="https://github.com/user-attachments/assets/0eb0e9c4-278a-48c6-919e-f4810743a2e7" />

Although it seemed learn to generate low frequency generation, the high frequency one is noisy. 

Would you please share some details about the training data and the training time and loss trends? I'm not  sure if I should terminate the training and add more data.

I just copied almost everything training details from stabilityai/stable-audio-open-small:

```json
"diffusion": {
            "cross_attention_cond_ids": ["prompt", "seconds_total"],
            "global_cond_ids": ["seconds_total"],
            "diffusion_objective": "rectified_flow",
            "distribution_shift_options": {
                "min_length": 256,
                "max_length": 4096
            },
            "type": "dit",
            "config": {
                "io_channels": 64,
                "embed_dim": 1024,
                "depth": 16,
                "num_heads": 8,
                "cond_token_dim": 768,
                "global_cond_dim": 768,
                "transformer_type": "continuous_transformer",
                "attn_kwargs": {
                    "qk_norm": "ln"
                }
            }
        },
        "io_channels": 64
    },
    "training": {
        "use_ema": true,
        "log_loss_info": false,
        "pre_encoded": false,
        "timestep_sampler": "trunc_logit_normal",
        "optimizer_configs": {
            "diffusion": {
                "optimizer": {
                    "type": "AdamW",
                    "config": {
                        "lr": 2e-4,
                        "betas": [0.9, 0.95],
                        "eps": 1e-8,
                        "weight_decay": 0.01,
                        "foreach": true
                    }
                },
                "scheduler": {
                    "type": "InverseLR",
                    "config": {
                        "inv_gamma": 1000000,
                        "power": 0.5,
                        "warmup": 0.995
                    }
                }
            }
        },
        "demo": {
            "demo_every": 2000,
            "demo_steps": 50,
            "num_demos": 8,
            "demo_cond": [
                {"prompt": "Amen break 174 BPM", "seconds_total": 6},
                {"prompt": "People talking in a crowded cafe", "seconds_total": 10},
                {"prompt": "A short, beautiful piano riff in C minor", "seconds_total": 6},
                {"prompt": "Tight Snare Drum", "seconds_total": 1},
                {"prompt": "A dog barking next to a waterfall", "seconds_total": 6},
                {"prompt": "Glitchy bass design, I used Serum for this", "seconds_total": 4},
                {"prompt": "Synth pluck arp with reverb and delay, 128 BPM", "seconds_total": 6},
                {"prompt": "Birds singing in the forest", "seconds_total": 10}
            ],
            "demo_cfg_scales": [1, 4, 7]
        }
    }
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Would you please share some training details for stabilityai/stable-audio-open-small? #199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Would you please share some training details for stabilityai/stable-audio-open-small? #199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions