Skip to content

billsioros/EditGen

Repository files navigation

EditGen: Harnessing Cross Attention Control for Instruction-Based auto-regressive Audio Editing

EditGen

arXiv Code License Python 3.9+

Accompanying code for the paper EditGen: Harnessing Cross Attention Control for Instruction-Based auto-regressive Audio Editing

In this study, we investigate leveraging cross-attention control for efficient audio editing within auto-regressive models. Inspired by image editing methodologies, we develop a Prompt-to-Prompt-like approach that guides edits through cross and self-attention mechanisms. Integrating a diffusion-based strategy, influenced by Auffusion, we extend the model's functionality to support refinement edits, establishing a baseline for prompt-guided audio editing. Additionally, we introduce an alternative approach by incorporating MUSICGEN, a pre-trained frozen auto-regressive model, and propose three editing mechanisms, based on Replacement, Reweighting, and Refinement of the attention scores. We employ commonly-used music-specific evaluation metrics and a human study, to gauge time-varying controllability, adherence to global text cues, and overall audio realism. The automatic and human evaluations indicate that the proposed combination of prompt-to-prompt guidance with autoregressive generation models significantly outperforms the diffusion-based baseline in terms of melody, dynamics, and tempo of the generated audio.

EditGen

📜 Citation

@misc{EditGen,
  author = {Vassilis Sioros},
  title = {EditGen: Harnessing Cross Attention Control for Instruction-Based auto-regressive Audio Editing},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/billsioros/EditGen}}
}

Note

The files under src/auffusion where taken directly from the Auffusion project for the purposes of comparing the two models and fall under the Apache 2.0 license.

About

EditGen: Harnessing Cross Attention Control for Instruction-Based auto-regressive Audio Editing

Topics

Resources

License

Stars

Watchers

Forks