Skip to content

WebChatAppAi/ACE-Step

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ACE-Step Training Fork

Complete Beginner Guide for Dataset Processing & LoRA Training

This Fork | Original Repo | Original Project | Hugging Face Model | Discord Support

ACE-Step Training Dashboard

๐ŸŽฏ This fork specializes in:

  • Automated Dataset Processing with Faster-Whisper
  • Clean LoRA Training CLIs with real-time dashboards
  • Complete Beginner Workflows from raw audio to trained model
  • Step-by-step automation - no manual file editing required

๐Ÿ† What Makes This Fork Special

This is a specialized fork of the original ACE-Step project, focused entirely on making training and dataset preparation accessible to beginners.

โœจ Complete Automation

  • ๐Ÿค– Automated dataset processing from raw audio files
  • ๐ŸŽค Faster-Whisper integration for automatic transcription
  • โš™๏ธ Auto-generated LoRA configs based on your dataset
  • ๐Ÿš€ One-command training with beautiful dashboards

๐ŸŽฏ Beginner-Friendly

  • ๐Ÿ“š Step-by-step guides for every process
  • ๐Ÿ–ฅ๏ธ Clean CLI interfaces with progress tracking
  • ๐Ÿ’ก Helpful tips and troubleshooting
  • ๐Ÿ”ง No manual configuration required

๐Ÿ—๏ธ Professional Training Tools

  • ๐Ÿ“Š Real-time training dashboard with live metrics
  • ๐Ÿ’พ Resume capability for interrupted training
  • ๐ŸŽ›๏ธ Resource optimization for different hardware setups
  • ๐Ÿ“ˆ Built-in validation and progress tracking

๐Ÿ™ Credits & Original Work

Original ACE-Step Project: ace-step/ACE-Step
Original Authors: Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo
Organizations: ACE Studio and StepFun

This fork builds upon their excellent foundation model work, adding specialized tooling for training workflows. All credit for the core ACE-Step model and research goes to the original team.

Original Organizations

๐Ÿ“‹ Table of Contents

  1. ๐Ÿš€ Quick Start Guide
  2. ๐Ÿ“ฆ Installation
  3. ๐ŸŽฏ Dataset Processing
  4. ๐ŸŽต Training Your Model
  5. ๐Ÿ–ฅ๏ธ Using Your Trained Model
  6. ๐Ÿ“š Detailed Guides
  7. ๐Ÿค Community & Support

๐Ÿš€ Quick Start Guide

๐ŸŽฏ Complete Workflow: Raw Audio โ†’ Trained Model

This fork provides a 3-step automated workflow from raw audio files to a trained ACE-Step model:

# Step 1: Process your audio files into training dataset
python -m dataset_cli_tool

# Step 2: Train your LoRA model with real-time dashboard
python train_cli_advanced.py --dataset_path ./prepared_dataset --lora_config_path ./lora_config.json

# Step 3: Use your trained model for music generation
acestep --port 7865

That's it! The tools handle everything else automatically.

๐ŸŽฌ What Each Step Does

๐Ÿ“ Step 1: Dataset Processing

  • Scans your audio files (MP3, WAV, FLAC, etc.)
  • Converts to consistent format
  • Uses Faster-Whisper to generate transcriptions automatically
  • Creates training-ready dataset structure
  • Generates optimized LoRA configuration

๐Ÿ‹๏ธ Step 2: LoRA Training

  • Beautiful real-time dashboard with live metrics
  • Automatic checkpointing and resume capability
  • GPU optimization and resource management
  • Progress tracking with ETA calculations

๐ŸŽต Step 3: Music Generation

  • Load your trained LoRA adapter
  • Generate music in your custom style
  • Web interface for easy interaction
  • Export and share your creations

๐Ÿ“ฆ Installation

๐ŸŽฏ One-Command Setup

# Clone the repository
git clone https://github.com/WebChatAppAi/ACE-Step.git
cd ACE-Step

# Create environment and install
conda create -n ace_step python=3.10 -y
conda activate ace_step
pip install -e .

๐Ÿ“‹ System Requirements

Minimum Requirements:

  • Python 3.10+
  • 8GB RAM
  • CUDA-compatible GPU (GTX 1660+ or RTX series)
  • 20GB free disk space

Recommended Setup:

  • Python 3.10
  • 16GB+ RAM
  • RTX 3090/4090 or A100
  • 50GB+ SSD storage
  • Fast internet for model downloads

Training Requirements:

  • CUDA 11.8+ or 12.1+
  • 12GB+ VRAM for LoRA training
  • FFmpeg (for dataset processing)

๐Ÿ”ง Additional Dependencies

For dataset processing, install:

pip install faster-whisper>=1.0.0 rich>=13.0.0 loguru librosa soundfile

Windows users need FFmpeg:


๐ŸŽฏ Dataset Processing

๐Ÿš€ Automated Dataset Tool

Our dataset CLI tool handles everything from raw audio to training-ready dataset:

# Interactive dataset processing
python -m dataset_cli_tool

What it does:

  1. ๐Ÿ” Scans for audio files (MP3, WAV, FLAC, M4A, etc.)
  2. ๐Ÿ”„ Converts to consistent format
  3. ๐ŸŽค Generates transcriptions with Faster-Whisper
  4. โœ… Validates dataset structure
  5. โš™๏ธ Creates optimized LoRA configuration

๐Ÿ“ Manual Dataset Processing

If you prefer step-by-step control:

# Step 1: Scan audio files
python -m dataset_cli_tool scan --path /your/audio/folder

# Step 2: Convert audio format
python -m dataset_cli_tool convert --input /audio --output /dataset --format mp3

# Step 3: Generate transcriptions
python -m dataset_cli_tool transcribe --input /dataset --model distil-large-v3

# Step 4: Generate LoRA config
python -m dataset_cli_tool generate-lora --dataset /dataset

๐ŸŽค Faster-Whisper Models

Model Speed Quality VRAM Best For
distil-large-v3 โšกโšกโšก ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ 6GB Recommended
large-v3 โšก ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ 10GB Best quality
base โšกโšกโšกโšก ๐ŸŒŸ๐ŸŒŸ๐ŸŒŸ 1GB Low VRAM

Complete dataset processing guide


๐ŸŽต Training Your Model

๐ŸŽจ Simple Training (Basic CLI)

python train_cli.py --dataset_path ./prepared_dataset --lora_config_path ./lora_config.json

Clean, organized output with progress tracking:

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ                    Training Configuration                    โ”ƒ
โ”ƒ Dataset Path:     ./prepared_dataset                        โ”ƒ
โ”ƒ LoRA Config:      ./lora_config.json                        โ”ƒ
โ”ƒ Learning Rate:    1.00e-04                                  โ”ƒ
โ”ƒ Max Steps:        2,000,000                                 โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›

Step: 1000/2000000 (0.1%) | Loss: 0.4532 | LR: 1.00e-04

๐Ÿ“Š Advanced Training (Dashboard CLI)

python train_cli_advanced.py --dataset_path ./prepared_dataset --lora_config_path ./lora_config.json

Beautiful real-time dashboard:

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘           ๐ŸŽต ACE-Step Training Dashboard ๐ŸŽต               โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ         Progress              โ”ƒ      Metrics             โ”ƒ
โ”ƒ Step:     1,234/2,000,000    โ”ƒ Total Loss:    0.4532    โ”ƒ
โ”ƒ Progress: 6.2%               โ”ƒ Denoising:     0.3421    โ”ƒ
โ”ƒ Speed:    1.45 steps/s       โ”ƒ Learning Rate: 1.00e-04  โ”ƒ
โ”ƒ ETA:      12h 34m            โ”ƒ VRAM Usage:    8.2GB     โ”ƒ
โ”—โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ปโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”›

โšก Quick Training Examples

# High-end GPU (RTX 4090, A100)
python train_cli.py --dataset_path ./dataset --lora_config_path ./config.json \
                    --batch_size 4 --precision 16

# Mid-range GPU (RTX 3080, 3090)
python train_cli.py --dataset_path ./dataset --lora_config_path ./config.json \
                    --batch_size 2 --precision 16

# Lower VRAM (GTX 1660, RTX 3060)
python train_cli.py --dataset_path ./dataset --lora_config_path ./config.json \
                    --batch_size 1 --accumulate_grad_batches 4

Complete training guide and examples


๐Ÿ–ฅ๏ธ Using Your Trained Model

๐Ÿš€ Launch ACE-Step with Your LoRA

# Basic usage
acestep --port 7865

# With optimizations
acestep --torch_compile true --cpu_offload true --overlapped_decode true --port 7865

๐ŸŽ›๏ธ Load Your Custom Model

In the web interface:

  1. Navigate to the Settings tab
  2. Find LoRA Settings
  3. Upload your trained .safetensors file
  4. Set LoRA scale (usually 0.7-1.0)
  5. Generate music with your custom style!

๐Ÿ’ก Performance Tips

# Memory optimization (8GB VRAM)
acestep --cpu_offload true --overlapped_decode true

# Speed optimization (High-end GPU)
acestep --torch_compile true --bf16 true

# Windows users (need triton)
pip install triton-windows

๐Ÿ“š Detailed Guides

This fork provides comprehensive documentation for every aspect:

๐Ÿ“– Core Guides

๐ŸŽฏ Quick References

  • Troubleshooting - Common issues and solutions
  • Performance Tuning - Hardware optimization
  • LoRA Configuration - Custom training settings
  • Model Integration - Using trained models

๐Ÿ”ง Advanced Topics

  • Multi-GPU training setup
  • Dataset format specifications
  • Custom model architectures
  • Evaluation and validation

๐Ÿค Community & Support

๐Ÿ’ฌ Get Help

  • Discord: ACE-Step Community
  • GitHub Issues: Report bugs and request features
  • Discussions: Share your trained models and results

๐Ÿคฒ Contributing

This fork welcomes contributions to:

  • Improve CLI interfaces
  • Add dataset processing features
  • Enhance training workflows
  • Fix bugs and optimize performance

๐Ÿ“ข Share Your Work

We'd love to see what you create:

  • Share your trained models
  • Post your generated music
  • Help other beginners learn

๐Ÿ“„ Original Research

ACE-Step Framework

๐ŸŽฏ Baseline Quality

๐ŸŒˆ Diverse Styles & Genres

  • ๐ŸŽธ Supports all mainstream music styles with various description formats including short tags, descriptive text, or use-case scenarios
  • ๐ŸŽท Capable of generating music across different genres with appropriate instrumentation and style

๐ŸŒ Multiple Languages

  • ๐Ÿ—ฃ๏ธ Supports 19 languages with top 10 well-performing languages including:
    • ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡จ๐Ÿ‡ณ Chinese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฐ๐Ÿ‡ท Korean
  • โš ๏ธ Due to data imbalance, less common languages may underperform

๐ŸŽป Instrumental Styles

  • ๐ŸŽน Supports various instrumental music generation across different genres and styles
  • ๐ŸŽบ Capable of producing realistic instrumental tracks with appropriate timbre and expression for each instrument
  • ๐ŸŽผ Can generate complex arrangements with multiple instruments while maintaining musical coherence

๐ŸŽค Vocal Techniques

  • ๐ŸŽ™๏ธ Capable of rendering various vocal styles and techniques with good quality
  • ๐Ÿ—ฃ๏ธ Supports different vocal expressions including various singing techniques and styles

๐ŸŽ›๏ธ Controllability

๐Ÿ”„ Variations Generation

  • โš™๏ธ Implemented using training-free, inference-time optimization techniques
  • ๐ŸŒŠ Flow-matching model generates initial noise, then uses trigFlow's noise formula to add additional Gaussian noise
  • ๐ŸŽš๏ธ Adjustable mixing ratio between original initial noise and new Gaussian noise to control variation degree

๐ŸŽจ Repainting

  • ๐Ÿ–Œ๏ธ Implemented by adding noise to the target audio input and applying mask constraints during the ODE process
  • ๐Ÿ” When input conditions change from the original generation, only specific aspects can be modified while preserving the rest
  • ๐Ÿ”€ Can be combined with Variations Generation techniques to create localized variations in style, lyrics, or vocals

โœ๏ธ Lyric Editing

  • ๐Ÿ’ก Innovatively applies flow-edit technology to enable localized lyric modifications while preserving melody, vocals, and accompaniment
  • ๐Ÿ”„ Works with both generated content and uploaded audio, greatly enhancing creative possibilities
  • โ„น๏ธ Current limitation: can only modify small segments of lyrics at once to avoid distortion, but multiple edits can be applied sequentially

๐Ÿš€ Applications

๐ŸŽค Lyric2Vocal (LoRA)

  • ๐Ÿ”Š Based on a LoRA fine-tuned on pure vocal data, allowing direct generation of vocal samples from lyrics
  • ๐Ÿ› ๏ธ Offers numerous practical applications such as vocal demos, guide tracks, songwriting assistance, and vocal arrangement experimentation
  • โฑ๏ธ Provides a quick way to test how lyrics might sound when sung, helping songwriters iterate faster

๐Ÿ“ Text2Samples (LoRA)

  • ๐ŸŽ›๏ธ Similar to Lyric2Vocal, but fine-tuned on pure instrumental and sample data
  • ๐ŸŽต Capable of generating conceptual music production samples from text descriptions
  • ๐Ÿงฐ Useful for quickly creating instrument loops, sound effects, and musical elements for production

๐Ÿ”ฎ Coming Soon

๐ŸŽค RapMachine

  • ๐Ÿ”ฅ Fine-tuned on pure rap data to create an AI system specialized in rap generation
  • ๐Ÿ† Expected capabilities include AI rap battles and narrative expression through rap
  • ๐Ÿ“š Rap has exceptional storytelling and expressive capabilities, offering extraordinary application potential

๐ŸŽ›๏ธ StemGen

  • ๐ŸŽš๏ธ A controlnet-lora trained on multi-track data to generate individual instrument stems
  • ๐ŸŽฏ Takes a reference track and specified instrument (or instrument reference audio) as input
  • ๐ŸŽน Outputs an instrument stem that complements the reference track, such as creating a piano accompaniment for a flute melody or adding jazz drums to a lead guitar

๐ŸŽค Singing2Accompaniment

  • ๐Ÿ”„ The reverse process of StemGen, generating a mixed master track from a single vocal track
  • ๐ŸŽต Takes a vocal track and specified style as input to produce a complete vocal accompaniment
  • ๐ŸŽธ Creates full instrumental backing that complements the input vocals, making it easy to add professional-sounding accompaniment to any vocal recording

๐Ÿ“‹ Roadmap

  • Release training code ๐Ÿ”ฅ
  • Release LoRA training code ๐Ÿ”ฅ
  • Release RapMachine LoRA ๐ŸŽค
  • Release evaluation performance and technical report ๐Ÿ“„
  • Train and Release ACE-Step V1.5
  • Release ControlNet training code ๐Ÿ”ฅ
  • Release Singing2Accompaniment ControlNet ๐ŸŽฎ

๐Ÿ–ฅ๏ธ Hardware Performance

We have evaluated ACE-Step across different hardware setups, yielding the following throughput results:

Device RTF (27 steps) Time to render 1 min audio (27 steps) RTF (60 steps) Time to render 1 min audio (60 steps)
NVIDIA RTX 4090 34.48 ร— 1.74 s 15.63 ร— 3.84 s
NVIDIA A100 27.27 ร— 2.20 s 12.27 ร— 4.89 s
NVIDIA RTX 3090 12.76 ร— 4.70 s 6.48 ร— 9.26 s
MacBook M2 Max 2.27 ร— 26.43 s 1.03 ร— 58.25 s

We use RTF (Real-Time Factor) to measure the performance of ACE-Step. Higher values indicate faster generation speed. 27.27x means to generate 1 minute of music, it takes 2.2 seconds (60/27.27).

๐Ÿ“ฆ Installation

1. Clone the Repository

First, clone the ACE-Step repository to your local machine and navigate into the project directory:

git clone https://github.com/WebChatAppAi/ACE-Step.git
cd ACE-Step

2. Prerequisites

Ensure you have the following installed:

  • Python: Version 3.10 or later is recommended. You can download it from python.org.
  • Conda or venv: For creating a virtual environment (Conda is recommended).

3. Set Up a Virtual Environment

It is highly recommended to use a virtual environment to manage project dependencies and avoid conflicts. Choose one of the following methods:

Option A: Using Conda

  1. Create the environment named ace_step with Python 3.10:

    conda create -n ace_step python=3.10 -y
  2. Activate the environment:

    conda activate ace_step

Option B: Using venv

  1. Navigate to the cloned ACE-Step directory.

  2. Create the virtual environment (commonly named venv):

    python -m venv venv 
  3. Activate the environment:

    • On Windows (cmd.exe):
      venv\Scripts\activate.bat
    • On Windows (PowerShell):
      .\venv\Scripts\Activate.ps1 
      (If you encounter execution policy errors, you might need to run Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process first)
    • On Linux / macOS (bash/zsh):
      source venv/bin/activate

4. Install Dependencies

Once your virtual environment is activated: a. (Windows Only) If you are on Windows and plan to use an NVIDIA GPU, install PyTorch with CUDA support first:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

(Adjust cu126 if you have a different CUDA version. For other PyTorch installation options, refer to the official PyTorch website).

b. Install ACE-Step and its core dependencies:

pip install -e .

The ACE-Step application is now installed. The GUI works on Windows, macOS, and Linux. For instructions on how to run it, please see the Usage section.


๐Ÿš€ Usage

Demo Interface

๐Ÿ“ฅ Model Download (Recommended First Step)

Download models to a custom location with full control:

# Download full model (7.2 GB)
python modeldownloader.py --output_dir ./models

# Download quantized model (2.5 GB, faster)
python modeldownloader.py --output_dir ./models --quantized

# Download to custom path
python modeldownloader.py --output_dir /path/to/my/models

The model downloader ensures proper directory structure and verifies all components.

๐Ÿ” Basic Usage

# Use downloaded models
acestep --checkpoint_path ./models --port 7865

# Auto-download (if no models specified)
acestep --port 7865

โš™๏ธ Advanced Usage

acestep --checkpoint_path ./models --port 7865 --device_id 0 --share true --bf16 true

Model Loading Priority:

  • If --checkpoint_path is set and models exist at the path, load from checkpoint_path.
  • If --checkpoint_path is set but models do not exist at the path, auto download models to checkpoint_path.
  • If --checkpoint_path is not set, auto download models to the default path ~/.cache/ace-step/checkpoints.

Note: Use python modeldownloader.py for reliable downloads with progress tracking.

If you are using macOS, please use --bf16 false to avoid errors.

๐Ÿ” API Usage

If you intend to integrate ACE-Step as a library into your own Python projects, you can install the latest version directly from GitHub using the following pip command.

Direct Installation via pip:

  1. Ensure Git is installed: This method requires Git to be installed on your system and accessible in your system's PATH.
  2. Execute the installation command:
    pip install git+https://github.com/WebChatAppAi/ACE-Step.git
    It's recommended to use this command within a virtual environment to avoid conflicts with other packages.

๐Ÿ› ๏ธ Command Line Arguments

  • --checkpoint_path: Path to the model checkpoint (default: downloads automatically)
  • --server_name: IP address or hostname for the Gradio server to bind to (default: '127.0.0.1'). Use '0.0.0.0' to make it accessible from other devices on the network.
  • --port: Port to run the Gradio server on (default: 7865)
  • --device_id: GPU device ID to use (default: 0)
  • --share: Enable Gradio sharing link (default: False)
  • --bf16: Use bfloat16 precision for faster inference (default: True)
  • --torch_compile: Use torch.compile() to optimize the model, speeding up inference (default: False).
    • Windows need to install triton:
      pip install triton-windows
      
  • --cpu_offload: Offload model weights to CPU to save GPU memory (default: False)
  • --overlapped_decode: Use overlapped decoding to speed up inference (default: False)

๐Ÿ“ฑ User Interface Guide

The ACE-Step interface provides several tabs for different music generation and editing tasks:

๐Ÿ“ Text2Music Tab

  1. ๐Ÿ“‹ Input Fields:

    • ๐Ÿท๏ธ Tags: Enter descriptive tags, genres, or scene descriptions separated by commas
    • ๐Ÿ“œ Lyrics: Enter lyrics with structure tags like [verse], [chorus], and [bridge]
    • โฑ๏ธ Audio Duration: Set the desired duration of the generated audio (-1 for random)
  2. โš™๏ธ Settings:

    • ๐Ÿ”ง Basic Settings: Adjust inference steps, guidance scale, and seeds
    • ๐Ÿ”ฌ Advanced Settings: Fine-tune scheduler type, CFG type, ERG settings, and more
  3. ๐Ÿš€ Generation: Click "Generate" to create music based on your inputs

๐Ÿ”„ Retake Tab

  • ๐ŸŽฒ Regenerate music with slight variations using different seeds
  • ๐ŸŽš๏ธ Adjust variance to control how much the retake differs from the original

๐ŸŽจ Repainting Tab

  • ๐Ÿ–Œ๏ธ Selectively regenerate specific sections of the music
  • โฑ๏ธ Specify start and end times for the section to repaint
  • ๐Ÿ” Choose the source audio (text2music output, last repaint, or upload)

โœ๏ธ Edit Tab

  • ๐Ÿ”„ Modify existing music by changing tags or lyrics
  • ๐ŸŽ›๏ธ Choose between "only_lyrics" mode (preserves melody) or "remix" mode (changes melody)
  • ๐ŸŽš๏ธ Adjust edit parameters to control how much of the original is preserved

๐Ÿ“ Extend Tab

  • โž• Add music to the beginning or end of an existing piece
  • ๐Ÿ“ Specify left and right extension lengths
  • ๐Ÿ” Choose the source audio to extend

๐Ÿ“‚ Examples

The examples/input_params directory contains sample input parameters that can be used as references for generating music.


๐Ÿ“œ License & Disclaimer

This project is licensed under Apache License 2.0

ACE-Step enables original music generation across diverse genres, with applications in creative production, education, and entertainment. While designed to support positive and artistic use cases, users should be mindful of ethical considerations and intellectual property rights in the application of this technology.

๐Ÿ“– Citation

If you find this project useful for your research, please consider citing the original work:

@misc{gong2025acestep,
	title={ACE-Step: A Step Towards Music Generation Foundation Model},
	author={Junmin Gong, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo}, 
	howpublished={\url{https://github.com/ace-step/ACE-Step}},
	year={2025},
	note={GitHub repository}
}

About

ACE-Step: Complete Beginner Guide for Dataset Processing & LoRA Training

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Jupyter Notebook 1.1%
  • Dockerfile 0.3%