Setting a cache directory when hosting a model with vllm #4308

shaily99 · 2024-04-23T22:43:43Z

shaily99
Apr 23, 2024

I am on a cluster where I want to use vllm to serve model. Now, my issue is that I want to be able to set a cache where my model weights get downloaded when hosting with the vllm.entrypoints.openai.api_server
I don't see any CLI argument that supports this.
For context, I want something similar to the --huggingface_hub_cache when using text_generation_launcher on HF's TGI.
I saw mixed comments on vllm's issues around vllm not respecting the default HF_HOME set in the environment. Any pointers?

shaily99 · 2024-04-24T13:41:04Z

shaily99
Apr 24, 2024
Author

Setting --dowload_dir works

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Setting a cache directory when hosting a model with vllm #4308

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Setting a cache directory when hosting a model with vllm #4308

Uh oh!

shaily99 Apr 23, 2024

Replies: 1 comment

Uh oh!

shaily99 Apr 24, 2024 Author

shaily99
Apr 23, 2024

shaily99
Apr 24, 2024
Author