LLama backend embeddings not working #1615

fdewes · 2024-01-20T11:35:29Z

fdewes
Jan 20, 2024

Hi everyone,

first of all: thank you for this great piece of software :)

I am trying to create embeddings with gguf models, such as Phi or Mistral. However, all my attempts to serve them via LocalAI and the LLama backend fail.
I am using the localai/localai:v2.5.1-cublas-cuda12 docker image

Here is the http error response:

InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'rpc error: code = Unimplemented desc = ', 'type': ''}}

This is the debug output from the localai docker image:

11:25AM DBG Request received: 
11:25AM DBG Parameter Config: &{PredictionOptions:{Model:phi-2.Q8_0.gguf Language: N:0 TopP:0 TopK:0 Temperature:0 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:phi-embeddings F16:false Threads:4 Debug:true Roles:map[] Embeddings:true Backend:llama TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[Please create python script for doing basic data science on entire pandas df] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
11:25AM INF Loading model 'phi-2.Q8_0.gguf' with backend llama
11:25AM DBG llama-cpp is an alias of llama-cpp
11:25AM DBG Model already loaded in memory: phi-2.Q8_0.gguf
[10.10.1.20]:62886 500 - POST /embeddings

This is the configuration file for the model.

https://raw.githubusercontent.com/fdewes/model_gallery/main/phi_embeddings.yaml

name: "phi-embeddings"
license: "Apache 2.0"
urls:
- https://huggingface.co/TheBloke/phi-2-GGUF
description: |
    Phi model that can be used for embeddings
config_file: |
    parameters:
      model: phi-2.Q8_0.gguf
    backend: llama
    embeddings: true
files:
- filename: "phi-2.Q8_0.gguf"
  sha256: "26a44c5a2bc22f33a1271cdf1accb689028141a6cb12e97671740a9803d23c63"
  uri: "https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q8_0.gguf"

Am i doing anything wrong or is this a bug? With the same models, I can create embeddings locally by using the llama-cpp-python bindings without problems. Any help solving this problem would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LLama backend embeddings not working #1615

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

LLama backend embeddings not working #1615

Uh oh!

Uh oh!

fdewes Jan 20, 2024

Replies: 0 comments

fdewes
Jan 20, 2024