-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
To help users get started with llama-server
more easily, I'd like to be able to do something like this:
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf
curl http://localhost:8080/completion -d '{
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Right now, the setup is more complicated, and I'm wondering whether it can be simplified:
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 2048
curl --request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
Motivation
It'd be great if we could make getting started with llama-server
easier for users and more welcoming!
Possible Implementation
(1) Make the -c
parameter optional? Maybe I'm misunderstanding but I thought the context size is a function of the model so it shouldn't need to be explicitly set
(2) Probably harder but make --hf-file optional
and use the largest one that fits in your machine's ram?
(3) Allow the endpoint to take messages in the standard openai format?
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]