Large batch sizes memory leak in llama.cpp? #5696

kalomaze · 2024-02-24T03:25:53Z

kalomaze
Feb 24, 2024

I notice that the larger the batch size, the more memory it requires to do consecutive batches.
This confuses me. Shouldn't the earlier batches not impact the future batches? As in, the memory from the earlier batches should be freed, and it shouldn't escalate beyond the first?

I OOM on the third batch when I set the bs to 2048 tokens:

Is this related to / caused by the fact that Flash Attention is not yet supported?

kalomaze · 2024-02-24T03:57:47Z

kalomaze
Feb 24, 2024
Author

It seems the memory usage keeps crawling continuously for each successive batch, which smells like a memory leak of some kind / memory doesn't get freed somewhere where it should be.

7 replies

JohannesGaessler Feb 24, 2024
Collaborator

How exactly are you running llama.cpp? I'm assuming you're on Windows based on your screenshots but more information is definitely needed here.

kalomaze Feb 24, 2024
Author

When using main.exe in CLI, I observe this trend on pure CPU, with 0 layers offloaded to GPU.
I built latest llama.cpp with cmake & CuBLAS, as x64-Release.

n_batch 2048 = 256mb increased memory use for each batch
n_batch 1024 = 64mb increased for each batch
n_batch 512 = 16mb increased for each batch

It seems to scale quadratically for whatever reason.

I am sourcing the numbers by comparing the difference in total memory used by main.exe after each consecutive batch according to task manager.

I used a very long test prompt of keymashed numbers for convenience, about ~9k tokens.

main.exe -m D:/mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf -c 12000 -b 2048 -p "7805453404783859027528703939740684426036520473469503476354354353453453453463046783905843209583409673409574309567039467340967450967450967450698785350934859043760394762049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490780545340478385902752870393974068442603652047346950347632049124031903478693463463456245634590340956734096734067456346309573940673240458320957329057349053470953478988888503925732409567420673096734095674309573409573490574390573490573450973495374509347509345734097609346734907805453404783859027528703939740684426036520473469503476320491240319034786934634634562456345903409567340967340674563463095739406732404583209573290573490534709534789888885039257324095674206730967340956743095734095734905743905734905734509734953745093475093457340976093467349078054534047838590275287039397406844260365204734695034763204912403190347869346346345624563459034095673409673406745634630957394067324045832095732905734905347095347898888850392573240956742067309673409567430957340957349057439057349057345097349537450934750934573409760934673490"

Artefact2 Feb 24, 2024
Collaborator

You can use llama-bench -p 12000 -b 2048 fwiw.

Does your GPU use VMM?

kalomaze Feb 24, 2024
Author

Not sure. If you are asking about passthrough for WSL reasons, I can attempt to test on my Linux partition instead of WSL. But do keep in mind it happens on pure CPU inference as well.

Artefact2 Feb 24, 2024
Collaborator

Offloading 0 layers with cublas isn't the same as pure CPU inference, the kv matmuls are still offloaded, and cublas will allocate internally for those. I think it's the same as #4946

JohannesGaessler · 2024-02-26T11:43:07Z

JohannesGaessler
Feb 26, 2024
Collaborator

What I think is happening is that this is one of the side effects of the inefficient attention implementation on master. It should be fixed with FlashAttention, if not I will take another look.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large batch sizes memory leak in llama.cpp? #5696

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Large batch sizes memory leak in llama.cpp? #5696

Uh oh!

kalomaze Feb 24, 2024

Replies: 2 comments · 7 replies

Uh oh!

kalomaze Feb 24, 2024 Author

Uh oh!

JohannesGaessler Feb 24, 2024 Collaborator

Uh oh!

Uh oh!

kalomaze Feb 24, 2024 Author

Uh oh!

Artefact2 Feb 24, 2024 Collaborator

Uh oh!

kalomaze Feb 24, 2024 Author

Uh oh!

Artefact2 Feb 24, 2024 Collaborator

Uh oh!

JohannesGaessler Feb 26, 2024 Collaborator

kalomaze
Feb 24, 2024

Replies: 2 comments 7 replies

kalomaze
Feb 24, 2024
Author

JohannesGaessler Feb 24, 2024
Collaborator

kalomaze Feb 24, 2024
Author

Artefact2 Feb 24, 2024
Collaborator

kalomaze Feb 24, 2024
Author

Artefact2 Feb 24, 2024
Collaborator

JohannesGaessler
Feb 26, 2024
Collaborator