Replies: 1 comment
-
I solved the problem, there was an difference in the model name when running locally vs hugging face. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I try to start an OpenAI compatible server using the command:
sudo docker run --runtime nvidia --gpus all -v /root/.cache/huggingface -p 18888:18888 vllm/vllm-openai --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0 --enforce-eager --port 18888
But when I try to make a request, I get the error:
"POST /v1/chat/completions HTTP/1.1" 404 Not Found
Doing
wget localhost:18888/v1/models
works and I get:"GET /v1/models HTTP/1.1" 200 OK
If I run
/usr/bin/python3 -m ochat.serving.openai_api_server --model TheBloke/openchat-3.5-0106-AWQ --host 0.0.0.0
The requests works. I wonder if I'm making mistake in how I use docker and whether the OpenAI endpoint is being set up or some other non-compatible API?
Beta Was this translation helpful? Give feedback.
All reactions