Replies: 1 comment
-
Setting |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am on a cluster where I want to use vllm to serve model. Now, my issue is that I want to be able to set a cache where my model weights get downloaded when hosting with the vllm.entrypoints.openai.api_server
I don't see any CLI argument that supports this.
For context, I want something similar to the --huggingface_hub_cache when using text_generation_launcher on HF's TGI.
I saw mixed comments on vllm's issues around vllm not respecting the default HF_HOME set in the environment. Any pointers?
Beta Was this translation helpful? Give feedback.
All reactions