enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example  on T4 GPU

Hi, I was following this example https://modelserving.com/blog/creating-stable-diffusion-20-service-with-bentoml-and-diffusers 

or this by git clone of this example repo https://github.com/bentoml/diffusers-examples/tree/main/sd2 

which results in a simple `service.py` file like this:


```
import torch
from diffusers import StableDiffusionPipeline

import bentoml
from bentoml.io import Image, JSON, Multipart

bento_model = bentoml.diffusers.get("sd2:latest")
stable_diffusion_runner = bento_model.to_runner()

svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])

@svc.api(input=JSON(), output=Image())
def txt2img(input_data):
    images, _ = stable_diffusion_runner.run(**input_data)
    return images[0]
```

After `bentoml serve service:svc --production` I get the following error (happens also with another custom model that I tried). It seems to be related to `enable_sequential_cpu_offload` by HuggingFace. 


```
[ERROR] [runner:sd2:1] Traceback (most recent call last):
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 671, in lifespan
    async with self.lifespan_context(app):
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 566, in __aenter__
    await self._router.startup()
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 650, in startup
    handler()
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 303, in init_local
    raise e
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 293, in init_local
    self._set_handle(LocalRunnerRef)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 139, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 443, in __init__
    self.pipeline: diffusers.DiffusionPipeline = load_model(
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 182, in load_model
    pipeline = pipeline.to(device_id)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 639, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.
```

As general info, it runs on a GCS VM instance with T4 GPU - could this be the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example on T4 GPU #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example on T4 GPU #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions