@@ -103,7 +103,7 @@ Every model is written from scratch to maximize performance and remove layers of
103
103
| Phi 4 | 14B | Microsoft Research | [ Abdin et al. 2024] ( https://arxiv.org/abs/2412.08905 ) |
104
104
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [ Qwen Team 2024] ( https://qwenlm.github.io/blog/qwen2.5/ ) |
105
105
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [ Hui, Binyuan et al. 2024] ( https://arxiv.org/abs/2409.12186 ) |
106
- | R1 Distll Llama | 8B, 70B | DeepSeek AI | [ DeepSeek AI 2025] ( https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ) |
106
+ | R1 Distill Llama | 8B, 70B | DeepSeek AI | [ DeepSeek AI 2025] ( https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ) |
107
107
| ... | ... | ... | ... |
108
108
109
109
<details >
@@ -143,7 +143,7 @@ Every model is written from scratch to maximize performance and remove layers of
143
143
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [ Hui, Binyuan et al. 2024] ( https://arxiv.org/abs/2409.12186 ) |
144
144
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [ An, Yang et al. 2024] ( https://arxiv.org/abs/2409.12122 ) |
145
145
| QwQ | 32B | Alibaba Group | [ Qwen Team 2024] ( https://qwenlm.github.io/blog/qwq-32b-preview/ ) |
146
- | R1 Distll Llama | 8B, 70B | DeepSeek AI | [ DeepSeek AI 2025] ( https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ) |
146
+ | R1 Distill Llama | 8B, 70B | DeepSeek AI | [ DeepSeek AI 2025] ( https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ) |
147
147
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [ Hugging Face 2024] ( https://github.com/huggingface/smollm ) |
148
148
| Salamandra | 2B, 7B | Barcelona Supercomputing Centre | [ BSC-LTC 2024] ( https://github.com/BSC-LTC/salamandra ) |
149
149
| StableCode | 3B | Stability AI | [ Stability AI 2023] ( https://stability.ai/blog/stablecode-llm-generative-ai-coding ) |
0 commit comments