How to support deploying two small models on a single GPU card for inference? #4426

RanchiZhao · 2024-04-28T07:08:04Z

RanchiZhao
Apr 28, 2024

How to support deploying two small models on a single GPU card for inference? For example, using the Ray framework. OOM always, even if I set gpu_memory_utilization to 0.3

lizhipengpeng · 2024-06-01T08:47:56Z

lizhipengpeng
Jun 1, 2024

Any solution for thisl

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to support deploying two small models on a single GPU card for inference? #4426

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to support deploying two small models on a single GPU card for inference? #4426

Uh oh!

RanchiZhao Apr 28, 2024

Replies: 1 comment

Uh oh!

lizhipengpeng Jun 1, 2024

RanchiZhao
Apr 28, 2024

lizhipengpeng
Jun 1, 2024