Skip to content

llm_load_tensors: VRAM used: 7337 MB? i have RTX 4060 laptop with 8188MiB, system is using only 77mb so how can this be 7400mb only and OOM? #3190

Closed Answered by staviq
hiqsociety asked this question in Q&A
Discussion options

You must be logged in to vote

If you want to "free" that 77M, you have to stop the desktop environment service and work from physical console, because "desktop" is rendered through the GPU too, and it takes away from its memory.

I believe the VRAM used value only shows the memory used by "model" itself, additional memory is used for couple more things.

You seem to be very close to fitting the whole thing on the GPU, check if -b 1 helps ( but it will make prompt processing slower, you can fiddle with this value to see how high can you go before oom ), you can also try reducing the context size, and if none of that helps, use a smaller quant of the model.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@hiqsociety
Comment options

@staviq
Comment options

Answer selected by hiqsociety
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants