Infill /Infill api not working on llama.cpp server #9861

deusmachinea · 2024-10-12T09:18:10Z

deusmachinea
Oct 12, 2024

I had downloaded the llama.cpp https://github.com/ggerganov/llama.cpp/releases/download/b3909/llama-b3909-bin-macos-arm64.zip unzipped it and running the llama-server like this

$BUILD_DIR/llama-server \
  -m $MODEL_PATH \
  --ctx-size 32000 \
  --parallel 2 \
  --n-gpu-layers -1
  --port 52555 \
  --threads 8 \
  --metrics \
    --batch-size 4096 \
    --no-mmap \
    --flash-attn \
  --cache-type-k f16 \
  --cache-type-v f16 \
   --repeat-last-n 64 \
   --repeat-penalty 1.3 \
   --top-k 40 \
   --top-p 0.9 \
   --rope-scaling yarn \
   --rope-scale 4.0

MODEL is basically Qwen2.5-Coder-7B-Instruct-Q4_K_L.gguf
/completions apis is working correctly but /infill request is returning errors .

This is how I am trying

r = requests.post("http://localhost:52555/infill", json={"input_prefix": code_before, "input_suffix": ""}, stream=True)

code_before is just a code snippet that is only 100 tokens long.
The error I am getting is

{"error":{"code":500,"message":"Input prompt is too big compared to KV size. Please try increasing KV size.","type":"server_error"}}

Please let us know how to make /infill request working.

ggerganov · 2024-10-12T19:40:50Z

ggerganov
Oct 12, 2024
Maintainer

You need to add "prompt": "". This is currently required and can optionally be used to pass partial "middle" completion. Although in most cases you probably want this to be empty.

The "prompt" parameter will likely become optional in the future.

0 replies

deusmachinea · 2024-10-12T20:09:38Z

deusmachinea
Oct 12, 2024
Author

Thank you Sir.
I retried with empty prompt but still getting the same response.

r = requests.post("http://localhost:52555/infill", json={"input_prefix": code_before, "input_suffix": "", "prompt": ""}, stream=True)

for line in r.iter_lines():
     if line:
         decoded_line = line.decode('utf-8')
         print(decoded_line)

{"error":{"code":500,"message":"Input prompt is too big compared to KV size. Please try increasing KV size.","type":"server_error"}}

the code_before snippet is

pub fn get_llm_temperature() -> f64 {
    // load_env(); // Load the .env file from the specified path
    env::var("TEMPERATURE")
        .unwrap_or_else(|_| {
            eprintln!("Warning: Environment variable TEMPERATURE is not set. Using default value of 0.4.");
            "0.4".to_string() // Use default value "0.4" as a string
        })
        .parse::<f64>()
        ```

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infill /Infill api not working on llama.cpp server #9861

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Infill /Infill api not working on llama.cpp server #9861

Uh oh!

deusmachinea Oct 12, 2024

Replies: 2 comments

Uh oh!

ggerganov Oct 12, 2024 Maintainer

Uh oh!

deusmachinea Oct 12, 2024 Author

deusmachinea
Oct 12, 2024

ggerganov
Oct 12, 2024
Maintainer

deusmachinea
Oct 12, 2024
Author