How to predict memory usage before running a model? #7820

slyt · 2024-06-07T18:46:19Z

slyt
Jun 7, 2024

Is it possible to deterministically predict the memory requirements (specifically interested in VRAM on Nvidia) that a model will consume?

I'm assuming it's something like: (n_parameters * datatype) + batch size overhead

How to also take into account quantization?

chrisalbertson · 2024-06-07T22:20:34Z

chrisalbertson
Jun 7, 2024

I'm using an Apple M2 Mac and I was astonished to find that almost zero memory is used. I'm still looking to see where the model is stored and then I found my system is caching 8.4GB of files in the system RAM . Macs what they call "unified RAM" so the data seems to never move, the GPU seems to be directly accessing the cached data. The Mac GPU has access to all of the system RAM.

But of course if you are using Linux in an Intel CPU with Nvida on a PCI bus then the data would have to move through the Llama.cpp process and then across the PCIe bus to the VRAM in the card. The first question is how many rows are you telling llama.cpp to load into VRAM? This is controlled by a runtime parameter so you cam specify the amount of VRAM used from zero up to loading all that will fit.

So the answer is that you get to specify how much VRAM is used and the rest of thre model runs on the CPU out of system RAM.

I am now wondering if quanitized models have to be unpacked. Does a 5-bit parameter get unpacked into an 8-bit byte. Does the answer to this question depend on the exact GPU or CPU being used?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to predict memory usage before running a model? #7820

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to predict memory usage before running a model? #7820

Uh oh!

Uh oh!

slyt Jun 7, 2024

Replies: 1 comment

Uh oh!

chrisalbertson Jun 7, 2024

slyt
Jun 7, 2024

chrisalbertson
Jun 7, 2024