Report on LLM Research Findings

Question: Compile a report summarizing the findings from our research on Large Language Models (LLMs). The report aims to:
- Evaluate the strengths and weaknesses of each LLM candidate based on our evaluation process.
- Recommend the best-suited LLM for our project, explaining the reasons behind our choice.
- Ensure the selected LLM meets these basic requirements:
  - Available under a permissive license (MIT or Apache 2.0).
  - Supports the English language.
  - Allows fine-tuning with structured datasets relevant to our project needs.
Results:

In our project, we want to fine-tune a Large Language Model so that it can provide better question-answering performance about the projects of the Cloud Native Computing Foundation.

For that, it should be able to process data in a "prompt-response" format.

Considerations

There are 5 things that need to be considered:

The model should be able to process data in a "prompt - response" format.
The model should be licensed under an open license like MIT/Apache.
The model should have good performance in the most common LLM benchmarks.
As we only have access to limited resources, our model should not be larger than necessary.
The available documentation for the model.

The first 2 considerations are an absolute requirement and are absolutely necessary so that we can finish the project. The 3rd and 4th considerations are more flexible as we ideally want very good performance at an acceptable size. The last consideration is also very important to us, as it likely makes it easier to create a transfer learning script.

Models

We have considered the following models:

Gemma
Llama3
Llama3-instruct
Mistral
Calme-7B-Instruct-v0.9
Mixtral-8x22b-Instruct

In the following, we will explain the models shortly and provide our assessment for the specific models.

Gemma

Gemma is an open LLM family developed by Google. It comprises two different versions: the 2 billion and 7 billion parameter versions. The first one is designed for use with CPUs while the latter one is designed for use with GPUs. It also has Instruction tuned versions but these seem to perform significantly worse. Gemma is developed under the Gemma license, which is an open license. It processes data in a "prompt - response" format. We want to use the 7b parameter version which has good performance in the most important benchmarks as you can see in the table below. There are a lot of resources on how to use it.

Llama3

Llama is an open LLM family developed by Meta. It has a 70b version and an 8b parameter version. It has instruction tuned and pre-trained variants. It is developed for use with GPUs. Llama is developed under the Llama license, which is an open license. If you have more than 700 million users a month, you have to buy the license though. It processes data in a "prompt - response" format. We would consider using all of the Llama3 models, as they all fulfill our requirements. The benchmark results can be seen in the table below. There are a lot of resources on how to use it.

Mistral

Mistral-AI has developed an open-source model family that comprises 3 different models: One 7b parameter base model, an 8x7b parameter mixture model, and an 8x22b parameter mixture model. These models are available as pre-trained and instruction fine-tuned. It is developed for use with GPUs and is developed under the Apache 2.0 license. This is an open license. We would consider using all of the Mistral models, as they all fulfill our requirements. The benchmark results can be seen in the table below. There might be a slight lack of resources when using this model. Both Gemma and Llama seemed to have more resources.

Model	Model size	HuggingFace Avg	ARC	HellaSwag	MMLU
Gemma	7b	64.3	61	82.5	66
Llama3	8b	62.6	59.5	82.1	66.7
Llama3-instruct	70b	77.8	71.42	85.7	80
Llama3-instruct	8b	66.8	60.7	78.5	67.07
Mistral-Instruct	7b	61	60	83	64
Mistral-8x7b-Instruct	56b	72.6	70.2	87.6	71.2
Mixtral-8x22b-Instruct	141b	79.1	72.7	89	77.7

Conclusion

The Llama model family seems to be the most attractive choice as it is the newest and the best performance-wise. However, the license might be a problem for our industry partner. As this is a very important point for our industry partner, we will not use it. Mistral also has decent performance, but it only exceeds the performance of the other models if you use significantly more parameters. There also aren't as many resources on how to train these models as there are with Llama or Gemma. Therefore, we decide that Gemma will be the model of our choice, as it has a lot of resources on how to train it, it has good performance at a reasonable size, and it fulfills the license requirements of our industry partner.

We still want to say that if a new open model is released while we are working on this project and we feel like it has more desirable properties than the Gemma model, we will still consider changing the model on the fly.

Link to Original Issue: Create a Report on LLM Research Findings Issue #21
Original Assignee: Christian Wielenberg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Report on LLM Research Findings

Considerations

Models

Gemma

Llama3

Mistral

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally