-
Notifications
You must be signed in to change notification settings - Fork 165
faq_en
Answer: There are several reasons: 1) The Llama-3 vocabulary has been expanded to 128K, which is a significant increase from its previous generations; 2) Encoding efficiency tests on the Chinese Wikipedia show that the Llama-3 vocabulary has about the same efficiency as our Chinese LLaMA-2 (around 95%); 3) Our work with the Chinese Mixtral indicates that expanding the vocabulary is not a necessary condition for large model language transfer.
Answer: Due to resource constraints, there is no guarantee at this time. We will consider training it if an appropriate opportunity arises.
Answer: This is due to the limitations in the Llama-3 license regarding resource distribution.
Answer: Yes, but please carefully read the commercial licensing requirements of the original Llama-3 beforehand. Developers should be responsible for the compliance of the models they use and seek legal support if necessary. This project is not responsible for any results or consequential losses arising from the use of these models.
Answer: This decision is based on several factors: 1) Training costs and efficiency; 2) Llama has evolved over three generations, with its Chinese capabilities also gradually improving—LoRA incremental training can quickly enhance its understanding and generation of Chinese; 3) We have benchmarked some community Chinese Llama-3 models and found that models trained with full parameters are not superior to those trained with the PEFT method. Hence, using LoRA is a result of balancing various factors.
Answer: Llama-3-Chinese is a base model, not a dialogue model, and is primarily used for secondary fine-tuning and other purposes. For interactive uses such as dialogue/queries, please use the Instruct version.
Answer: We did not add any identity data during the training of the instruction model; thus, the model's output mainly depends on the training data from the SFT phase. Since the SFT data included a large amount of data scraped from ChatGPT, the model tends to mimic ChatGPT's behavior, hence replying that it is ChatGPT. Users need not focus too much on this issue. If needed, they can: 1) write system commands to give the model identity information; 2) construct their own identity instruction data and further fine-tune based on our model.
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Scripts
- FAQ