Out of VRAM with SD 3.5 Large FP8 scaled, but no issue in ComfyUI #5828

p-try · 2025-07-10T13:03:31Z

p-try
Jul 10, 2025

Hello erverybody,

I wonder if somebody here may have an idea about a little problem.

I have an AMD Radeon RX 6800 XT with 16 GB of VRAM and I'm trying to set up image generation using SD 3.5. I am using this model:
https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/sd3.5_large_fp8_scaled.safetensors

This is the model configuration:

name: sd3.5-large-fp8-scaled-safetensors
backend: stablediffusion-ggml
parameters:
  model: sd3.5_large_fp8_scaled.safetensors
step: 20
cfg_scale: 4.5
low_vram: true
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl_fp8_e4m3fn_scaled.safetensors"
- "sampler:euler"

LocalAI reports that it is out of VRAM:

localai-1  | 12:26PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr ggml_cuda_init: found 1 ROCm devices:
localai-1  | 12:26PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr   Device 0: AMD Radeon RX 6800 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32
localai-1  | 12:26PM INF Success ip=127.0.0.1 latency="33.8µs" method=GET status=200 url=/readyz
localai-1  | 12:27PM INF Success ip=127.0.0.1 latency="86.485µs" method=GET status=200 url=/readyz
localai-1  | 12:27PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr Created context: OK
localai-1  | 12:27PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr Generating image
localai-1  | 12:27PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1664.00 MiB on device 0: cudaMalloc failed: out of memory
localai-1  | 12:27PM DBG GRPC(sd3.5-large-fp8-scaled-safetensors-127.0.0.1:33845): stderr ggml_gallocr_reserve_n: failed to allocate ROCm0 buffer of size 1744830464

However, when I generate an image using the same model files in ComfyUI, everything works just fine (and quite fast). It DOES complain about VRAM but only during VAE decoding:

got prompt
Using scaled fp8: fp8 matrix mult: False, scale input: True
model weight dtype torch.float16, manual cast: None
model_type FLOW
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
Using scaled fp8: fp8 matrix mult: False, scale input: False
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD3ClipModel_
loaded completely 15122.8 6228.190093994141 True
/root/ComfyUI/comfy/ops.py:80: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:331.)
  return torch.nn.functional.linear(input, weight, bias)
Requested to load SD3
loaded completely 11615.620722656251 7683.561706542969 True
100%|██████████| 20/20 [01:30<00:00,  4.51s/it]
Requested to load AutoencodingEngine
loaded completely 3787.8148437500004 319.7467155456543 True
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Prompt executed in 166.86 seconds

Now I'm wondering:

Maybe the model only narrowly fits into VRAM and the LocalAI stack has a little more overhead?
Maybe the model does not fit into VRAM but ComfyUI manages to offload some to the regular RAM which LocalAI doesn't?
Something else?

Can somebody point me in the right direction?

Thank you very much in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Out of VRAM with SD 3.5 Large FP8 scaled, but no issue in ComfyUI #5828

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Out of VRAM with SD 3.5 Large FP8 scaled, but no issue in ComfyUI #5828

Uh oh!

p-try Jul 10, 2025

Replies: 0 comments

p-try
Jul 10, 2025