Skip to content

Why does llama.cpp use so much VRAM (and RAM)? #9784

Answered by slaren
MislavJuric asked this question in Q&A
Discussion options

You must be logged in to vote

Look at the messages printed while loading the model, llama.cpp will tell you the size of (almost) every backend buffer it allocates. The CUDA runtime also needs some memory that may not be accounted elsewhere.

Replies: 4 comments 16 replies

Comment options

You must be logged in to vote
3 replies
@MislavJuric
Comment options

@wooooyeahhhh
Comment options

@MislavJuric
Comment options

Comment options

You must be logged in to vote
1 reply
@ggerganov
Comment options

Comment options

You must be logged in to vote
11 replies
@bosmart
Comment options

@hariji814
Comment options

@hariji814
Comment options

@ggerganov
Comment options

@hariji814
Comment options

Comment options

You must be logged in to vote
1 reply
@slaren
Comment options

Answer selected by MislavJuric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants