how to use auth token to use llama2 in vLLM? and question about presence_penalty #717
Closed
orellavie1212
announced in
Q&A
Replies: 3 comments
-
I found out on that #539 that frequeny_penalty is repetition_penalty, not sure if it is true |
Beta Was this translation helpful? Give feedback.
0 replies
-
anyone needed a solution: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Also found some docs on integrating it with vLLM: https://docs.mistral.ai/deployment/self-deployment/vllm/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
nowhere in the docs nor in the git https://github.com/vllm-project/vllm/tree/main I found anything related to huggingface token to download from huggingface the model of llama2 (like 13b chat hf)? I get the error in sagemaker of:
Repo model meta-llama/Llama-2-13b-chat-hf is gated. You must be authenticated to access it.
[INFO ] PyProcess - Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-2-13b-hf/resolve/main/config.json.
what to do?
model_name = "meta-llama/Llama-2-13b-chat-hf"
sampling_params = SamplingParams(temperature=0.1, top_p=0.75, top_k=0.4, presence_penalty=1.17)
llm = LLM(model=model_name, tensor_parallel_size=4 )
btw, is presence_penalty is the known ״repetition_penalty״ in other models? or it is the frequency_penalty one in SamplingParams?
Beta Was this translation helpful? Give feedback.
All reactions