Replies: 2 comments 2 replies
-
By default if you compiled with GPU support some calculations will be offloaded to the GPU during inference. If you want the real speedups, you will need to offload layers onto the gpu. This means that you can choose how many layers run on CPU and how many run on GPU.
This is what I'm talking about. You can specify how many layers you want to offload to the GPU using the Offload all:
Offload some
etc |
Beta Was this translation helpful? Give feedback.
-
Thank you. I got it. It works. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I doing something wrong. When I run the model, the GPU is not really used (only 600MB). But RAM isn't uses to. The CPU has load and it seems, that there is a lot of IO-Traffic on the SSD.
I see this in the log:
Can someone explain?
Beta Was this translation helpful? Give feedback.
All reactions