-
Win11, cuBLAS, latest commit. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Try --no-mmap |
Beta Was this translation helpful? Give feedback.
-
Tensors offloaded to VRAM are normally unmaped: but it can be improved indeed: When you unmap a file, the operating system removes the mapping from your process’s virtual memory space. However, the data that was loaded into memory might still remain in the system’s page cache |
Beta Was this translation helpful? Give feedback.
@slaren @phymbert
I conducted testing with another model that fully fit into RAM.
You were right, offloading to the GPU does indeed reduce RAM usage, although not as effectively as I had hoped.
Apparently, the model I wanted to launch did not fit, even considering the offloading to the GPU.
I apologize for wasting your time unnecessarily.