Skip to main content

About ACE-Step 1.5 in ComfyUI

ACE-Step 1.5 is a major update to the open-source music generation model, now natively supported in ComfyUI. It brings commercial-grade quality to your local machine with a novel hybrid architecture where a Language Model acts as an omni-capable planner, transforming simple user queries into comprehensive song blueprints. ACE-Step 1.5 Model highlights:
  • Commercial-grade quality: Achieves quality beyond most commercial music models, scoring 4.72 on musical coherence
  • Blazing fast generation: Generate a full 4-minute song in ~1 second on RTX 5090, or under 10 seconds on RTX 3090 with ComfyUI
  • 50+ language support: Strong support for English, Chinese, Japanese, Korean, Spanish, German, French, Portuguese, Italian, and Russian
  • LoRA fine-tuning: Supports lightweight personalization through LoRA training in ComfyUI
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you can’t find them in the template, your ComfyUI may be outdated. (Desktop version’s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. You are not using the latest ComfyUI version (Nightly version)
  2. Some nodes failed to import at startup
The AIO version packages all models into a single checkpoint file, making it easier to download and manage.

AIO Workflow

AIO Model Download

AIO Model Storage Location
📂 ComfyUI/
├── 📂 models/
│   └── 📂 checkpoints/
│       └── ace_step_1.5_turbo_aio.safetensors

Option 2: Split Model Files

The split version allows you to download individual model components separately.

Split Workflow

Split Model Downloads

Split Models Storage Location
📂 ComfyUI/
├── 📂 models/
│   ├── 📂 diffusion_models/
│   │   └── acestep_v1.5_turbo.safetensors
│   ├── 📂 text_encoders/
│   │   ├── qwen_0.6b_ace15.safetensors
│   │   └── qwen_1.7b_ace15.safetensors
│   └── 📂 vae/
│       └── ace_1.5_vae.safetensors

ACE-Step 1.5 Key Features in ComfyUI

Chain-of-Thought Planning

The ACE-Step 1.5 model synthesizes metadata, lyrics, and captions via Chain-of-Thought reasoning to guide the diffusion process, resulting in more coherent long-form compositions.

Hybrid LM + DiT Architecture

ACE-Step 1.5 combines a Language Model that plans the song structure with a Diffusion Transformer (DiT) that handles audio synthesis, all running natively in ComfyUI.

LoRA Fine-Tuning in ComfyUI

With just a few songs, you can train a LoRA that captures a specific style. Because you run ACE-Step 1.5 locally in ComfyUI, you own the LoRA and don’t have to worry about data leakage.

Coming Soon to ComfyUI

These features are available in ACE-Step 1.5 but not yet supported in ComfyUI:
  • Cover: Give the model any song as input along with a new prompt and lyrics, and it will reimagine the track in a completely different style
  • Repaint: Select a segment, regenerate just that section, and the model stitches it back in while keeping everything else untouched