Deploying gpu versions meets "error while loading shared libraries: libcuda.so.1" #1798

qingfenghcy · 2024-03-05T14:15:12Z

qingfenghcy
Mar 5, 2024

I have my own k8s cluster, in which the A100 server is managed, the specific environment information is as follows:

CPU intel Xeon 6330
GPU：A100-40GB
k8s：1.21

With reference to xxx,

helm install local-ai go-skynet/local-ai -f values.yaml

I deployed the localai-3.2.0 version by helm, downloaded the local-ai:v2.9.0-cublas-cudall image and the ggml-gpt4all-j.bin model file in advance. Added F16:true,GPU_LAYERS:20,gpus:all environment variables to localai workload environment variables. The container can be started normally, but cannot provide external services normally. The request and container log are as follows:

Request and return:

# curl http://10.247.104.12/v1/models
{"object":"list","data":[{"id":"ggml-gpt4all-j.bin","object":"model"},{"id":"tinyllama-1.1b-chat-v0.3.Q8_0.gguf","object":"model"}]}[root@dtcsl-59vpkg14 ~]#
[root@dtcsl-59vpkg14 ~]# curl --location --request POST 'http://10.247.104.12/v1/chat/completions' --header 'Content-Type: application/json' --data '{
>     "model": "ggml-gpt4all-j.bin",
>     "messages": [
>         {
>             "role": "user",
>             "content": "How are you?"
>         }
>     ],
>     "temperature": 0.9
> }'
{"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}}[

The container log is as follows

@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name	: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local wbnoinvd dtherm arat pln pts avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid md_clear pconfig flush_l1d arch_capabilities
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU:    AVX512 found OK
@@@@@
�[90m7:57AM�[0m �[33mDBG�[0m no galleries to load
�[90m7:57AM�[0m �[32mINF�[0m Starting LocalAI using 16 threads, with models path: /models
�[90m7:57AM�[0m �[32mINF�[0m LocalAI version: v2.9.0 (ff88c390bb51d9567572815a63c575eb2e3dd062)
�[90m7:57AM�[0m �[32mINF�[0m Preloading models from /models
�[90m7:57AM�[0m �[33mDBG�[0m Extracting backend assets files to /tmp/localai/backend_data
�[90m7:57AM�[0m �[33mDBG�[0m No uploadedFiles file found at /tmp/localai/upload/uploadedFiles.json

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.50.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ........... 105  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 15 │ 
 └───────────────────────────────────────────────────┘ 

[10.224.205.85]:15349 200 - GET /v1/models
�[90m8:01AM�[0m �[33mDBG�[0m Request received: 
�[90m8:01AM�[0m �[33mDBG�[0m Configuration read: &{PredictionOptions:{Model:ggml-gpt4all-j.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:true Threads:16 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
�[90m8:01AM�[0m �[33mDBG�[0m Parameters: &{PredictionOptions:{Model:ggml-gpt4all-j.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:true Threads:16 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName: ParallelCalls:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
�[90m8:01AM�[0m �[33mDBG�[0m Prompt (before templating): How are you?
�[90m8:01AM�[0m �[33mDBG�[0m Prompt (after templating): How are you?
�[90m8:01AM�[0m �[32mINF�[0m Trying to load the model 'ggml-gpt4all-j.bin' with all the available backends: llama-cpp, llama-ggml, llama, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper, /build/backend/python/transformers/run.sh, /build/backend/python/mamba/run.sh, /build/backend/python/transformers-musicgen/run.sh, /build/backend/python/exllama/run.sh, /build/backend/python/exllama2/run.sh, /build/backend/python/coqui/run.sh, /build/backend/python/petals/run.sh, /build/backend/python/autogptq/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/vllm/run.sh, /build/backend/python/sentencetransformers/run.sh, /build/backend/python/diffusers/run.sh, /build/backend/python/vall-e-x/run.sh, /build/backend/python/bark/run.sh
�[90m8:01AM�[0m �[32mINF�[0m [llama-cpp] Attempting to load
�[90m8:01AM�[0m �[32mINF�[0m Loading model 'ggml-gpt4all-j.bin' with backend llama-cpp
�[90m8:01AM�[0m �[33mDBG�[0m Loading model in memory from file: /models/ggml-gpt4all-j.bin
�[90m8:01AM�[0m �[33mDBG�[0m Loading Model ggml-gpt4all-j.bin with gRPC (file: /models/ggml-gpt4all-j.bin) (backend: llama-cpp): {backendString:llama-cpp model:ggml-gpt4all-j.bin threads:16 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0005ac600 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh petals:/build/backend/python/petals/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
�[90m8:01AM�[0m �[33mDBG�[0m Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp
�[90m8:01AM�[0m �[33mDBG�[0m GRPC Service for ggml-gpt4all-j.bin will be running at: '127.0.0.1:42637'
�[90m8:01AM�[0m �[33mDBG�[0m GRPC Service state dir: /tmp/go-processmanager59434148
�[90m8:01AM�[0m �[33mDBG�[0m GRPC Service Started
�[90m8:01AM�[0m �[33mDBG�[0m GRPC(ggml-gpt4all-j.bin-127.0.0.1:42637): stderr /tmp/localai/backend_data/backend-assets/grpc/llama-cpp: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

I installed the localai cpu version through helm and used the cpu image to successfully provide services, but the gpu version did not. I have consulted official documentation and github issues and have been unable to find a solution.Thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Deploying gpu versions meets "error while loading shared libraries: libcuda.so.1" #1798

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Deploying gpu versions meets "error while loading shared libraries: libcuda.so.1" #1798

Uh oh!

Uh oh!

qingfenghcy Mar 5, 2024

Replies: 0 comments

qingfenghcy
Mar 5, 2024