Replies: 1 comment 1 reply
-
@myselfffo try to reduce even more the number of layers loaded into your VRAM ( Additionally, you can change the model you are using - c.f. https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF#provided-files (if you are using Mistral) - you should take a model that has a smaller memory footprint. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I have my installation running on Debian 12 PC with GPU. Its very slow and i constantly run out of memory even though I have lowered the settings:
@Inject
def init(self) -> None:
match settings.llm.mode:
case "local":
from llama_index.llms import LlamaCPP
My GPU is:
NVIDIA GeForce GTX 1050 Ti
CUDA Cores: 768
Total Memory: 4096 MB
I have never tried running it on CPU. I have:
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz 0 0:0 4400.00 MHz
MemTotal 32805176 KiB
Would that be faster to run on CPU? if so how can i install again with CPU only without ruining the GPU option?
Beta Was this translation helpful? Give feedback.
All reactions