This repository was archived by the owner on Aug 4, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
This repository was archived by the owner on Aug 4, 2025. It is now read-only.
enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example on T4 GPU #2
Copy link
Copy link
Open
Description
Hi, I was following this example https://modelserving.com/blog/creating-stable-diffusion-20-service-with-bentoml-and-diffusers
or this by git clone of this example repo https://github.com/bentoml/diffusers-examples/tree/main/sd2
which results in a simple service.py
file like this:
import torch
from diffusers import StableDiffusionPipeline
import bentoml
from bentoml.io import Image, JSON, Multipart
bento_model = bentoml.diffusers.get("sd2:latest")
stable_diffusion_runner = bento_model.to_runner()
svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])
@svc.api(input=JSON(), output=Image())
def txt2img(input_data):
images, _ = stable_diffusion_runner.run(**input_data)
return images[0]
After bentoml serve service:svc --production
I get the following error (happens also with another custom model that I tried). It seems to be related to enable_sequential_cpu_offload
by HuggingFace.
[ERROR] [runner:sd2:1] Traceback (most recent call last):
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 671, in lifespan
async with self.lifespan_context(app):
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 566, in __aenter__
await self._router.startup()
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 650, in startup
handler()
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 303, in init_local
raise e
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 293, in init_local
self._set_handle(LocalRunnerRef)
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 139, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 443, in __init__
self.pipeline: diffusers.DiffusionPipeline = load_model(
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 182, in load_model
pipeline = pipeline.to(device_id)
File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 639, in to
raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
As general info, it runs on a GCS VM instance with T4 GPU - could this be the issue?
Metadata
Metadata
Assignees
Labels
No labels