GPU config help #1058
thebrahman
started this conversation in
General
GPU config help
#1058
Replies: 1 comment 2 replies
-
That yaml file would not work, it needs to be formatted like - https://localai.io/howtos/easy-model-import-downloaded/ |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am struggling to get models to run on my 4090. my OS is windows and I am running docker. It recognises my gpu, but doesn't offload any layers.
I followed this to setup:
https://localai.io/howtos/easy-setup-docker-gpu/
have this yaml file in the models folder:
terminal:
2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:llama-2-7b-chat.ggmlv3.q4_K_M.bin ContextSize:512 Seed:0 NBatch:512 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:2 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/llama-2-7b-chat.ggmlv3.q4_K_M.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 Tokenizer: LoraBase: LoraAdapter: NoMulMatQ:false AudioPath:}
2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr ggml_init_cublas: found 1 CUDA devices: 2023-09-15 00:05:22 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9
03): stderr llama_model_load_internal: ggml ctx size = 3891.33 MB 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr WARNING: failed to allocate 3891.33 MB of pinned memory: out of memory 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: using CUDA for GPU acceleration 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: mem required = 4193.33 MB (+ 512.00 MB per state) 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: offloading 0 repeating layers to GPU 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: offloaded 0/35 layers to GPU 2023-09-15 00:05:24 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_model_load_internal: total VRAM used: 288 MB 2023-09-15 00:05:29 localai-api-1 | 2:05PM DBG GRPC(llama-2-7b-chat.ggmlv3.q4_K_M.bin-127.0.0.1:43703): stderr llama_new_context_with_model: kv self size = 512.00 MB
Beta Was this translation helpful? Give feedback.
All reactions