Where can I download the LLaMa model weights? #4576
-
I am trying to LLaMa running and I am stuck at this step: https://github.com/ggerganov/llama.cpp#prepare-data--run I'm not sure exactly what this command is: What is the difference between running llama.cpp with the BPE tokenizer model weights and the LLaMa model weights? Do I run both commands: I have searched around the web but I can't seem to find the actual model weights. I'm also not sure if I just move all the files to the models folder once I download the model weights and if that would allow the program to start working once I run the rest of the commands in the prepare data run command and do |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
I cloned the llama.cpp source with git, build it with make and downloaded GGUF-Files of the models. When i use the exact prompt syntax, the prompt was trained with, it worked. Good source for GGUF-files: https://huggingface.co/TheBloke Sure, when you use a graphic card, perhaps you have to enable something, to make it work. 65B 30B 13B 7B vocab.json is not a command, you have to execute. I have my models in two folders and use them this way (CPU only):
Take care of useable RAM and RAM consumption and adjust for your needs. |
Beta Was this translation helpful? Give feedback.
-
@jzry these original instructions are for the first release of LLAMA, released on a strict research condition only, they will have to process your request if you plan to obtain the original weights. The models are in a pytorch format, (not huggingface's) ; for the line you mentioned: the two of them are together as one.
|
Beta Was this translation helpful? Give feedback.
-
This worked for me: from huggingface_hub import hf_hub_download
import joblib
REPO_ID = "TheBloke/LLaMA-7b-GGUF"
FILENAME = "llama-7b.Q3_K_S.gguf"
hf_hub_download(repo_id=REPO_ID, filename=FILENAME) |
Beta Was this translation helpful? Give feedback.
I cloned the llama.cpp source with git, build it with make and downloaded GGUF-Files of the models. When i use the exact prompt syntax, the prompt was trained with, it worked.
Good source for GGUF-files: https://huggingface.co/TheBloke
Sure, when you use a graphic card, perhaps you have to enable something, to make it work.
65B 30B 13B 7B vocab.json is not a command, you have to execute. I have my models in two folders and use them this way (CPU only):
./main -t 6 -m ~/Downloads/models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf -c 8192 --temp 0.7 --repeat_penalty 1.1 --log-disable -n -1 -p "<s>[INST] Write a short text about UPX. [/INST]"
Take care of useable RAM and RAM consumption and adjus…