You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
eg. include this if the model fail to start `--max-model-len 16384 --gpu-memory-utilization 0.95`
66
+
- If you want to run larger models on GPUs with less VRAM, there are several techniques you can use to optimize GPU memory utilization:
67
+
- You can adjust the GPU memory utilization to maximize the available memory by using a flag like `--gpu_memory_utilization`. This allows the model to use a specified percentage of GPU memory.
# This command sets the model to use 95% of the available GPU memory.
71
+
```
72
+
- Using mixed precision (FP16) instead of full precision (FP32) reduces the amount of memory required to store model weights, which can significantly lower VRAM usage.
0 commit comments