faq_en

Frequently Asked Questions

Question 1: Why didn't you expand the vocabulary like in the first and second phase projects?

Answer: There are several reasons: 1) The Llama-3 vocabulary has been expanded to 128K, which is a significant increase from its previous generations; 2) Encoding efficiency tests on the Chinese Wikipedia show that the Llama-3 vocabulary has about the same efficiency as our Chinese LLaMA-2 (around 95%); 3) Our work with the Chinese Mixtral indicates that expanding the vocabulary is not a necessary condition for large model language transfer.

Question 2: Will there be a 70B version released?

Answer: Due to resource constraints, there is no guarantee at this time. We will consider training it if an appropriate opportunity arises.

Question 3: Why is the instruction model no longer called Alpaca?

Answer: This is due to the limitations in the Llama-3 license regarding resource distribution.

Question 4: Can the models in this repository be used commercially?

Answer: Yes, but please carefully read the commercial licensing requirements of the original Llama-3 beforehand. Developers should be responsible for the compliance of the models they use and seek legal support if necessary. This project is not responsible for any results or consequential losses arising from the use of these models.

Question 5: Why not perform full pre-training on the model and instead use LoRA?

Answer: This decision is based on several factors: 1) Training costs and efficiency; 2) Llama has evolved over three generations, with its Chinese capabilities also gradually improving—LoRA incremental training can quickly enhance its understanding and generation of Chinese; 3) We have benchmarked some community Chinese Llama-3 models and found that models trained with full parameters are not superior to those trained with the PEFT method. Hence, using LoRA is a result of balancing various factors.

Question 6: Why is the Llama-3-Chinese dialogue effect not good?

Answer: Llama-3-Chinese is a base model, not a dialogue model, and is primarily used for secondary fine-tuning and other purposes. For interactive uses such as dialogue/queries, please use the Instruct version.

Question 7: Why would the instruction model reply that it is ChatGPT?

Answer: We did not add any identity data during the training of the instruction model; thus, the model's output mainly depends on the training data from the SFT phase. Since the SFT data included a large amount of data scraped from ChatGPT, the model tends to mimic ChatGPT's behavior, hence replying that it is ChatGPT. Users need not focus too much on this issue. If needed, they can: 1) write system commands to give the model identity information; 2) construct their own identity instruction data and further fine-tune based on our model.

中文文档

English Docs

Model Reconstruction
Model Quantization, Inference and Deployment
System Performance
Training Scripts
- Pre-training Scripts
- Instruction Fine-tuning Scripts
FAQ

faq_en

Frequently Asked Questions

Question 1: Why didn't you expand the vocabulary like in the first and second phase projects?

Question 2: Will there be a 70B version released?

Question 3: Why is the instruction model no longer called Alpaca?

Question 4: Can the models in this repository be used commercially?

Question 5: Why not perform full pre-training on the model and instead use LoRA?

Question 6: Why is the Llama-3-Chinese dialogue effect not good?

Question 7: Why would the instruction model reply that it is ChatGPT?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

中文文档

English Docs

Clone this wiki locally