[Model Request] Cerebras' BTLM-3B-8K #4663

joseph777111 · 2023-12-28T12:16:09Z

joseph777111
Dec 28, 2023

Bittensor Language Model (BTLM-3B-8k-base) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama. BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.

BTLM was trained by Cerebras in partnership with Opentensor on the newly unveiled Condor Galaxy 1 (CG-1) supercomputer, the first public deliverable of the G42-Cerebras strategic partnership.

BTLM-3B-8k was trained with a similar architecture to CerebrasGPT with the addition of SwiGLU nonlinearity, ALiBi position embeddings, and maximal update parameterization (muP). The model was trained for 1 epoch of SlimPajama-627B. 75% of training was performed with 2k sequence length. The final 25% of training was performed at 8k sequence length to enable long sequence applications

https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/

https://huggingface.co/cerebras/btlm-3b-8k-base

https://arxiv.org/abs/2309.11568