[Model Request] Cerebras' BTLM-3B-8K #4663
joseph777111
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Bittensor Language Model (BTLM-3B-8k-base) is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama. BTLM-3B-8k-base sets a new standard for 3B parameter models, outperforming models trained on hundreds of billions more tokens and achieving comparable performance to open 7B parameter models. BTLM-3B-8k-base can also be quantized to 4-bit to fit in devices with as little as 3GB of memory. The model is made available with an Apache 2.0 license for commercial use.
BTLM was trained by Cerebras in partnership with Opentensor on the newly unveiled Condor Galaxy 1 (CG-1) supercomputer, the first public deliverable of the G42-Cerebras strategic partnership.
BTLM-3B-8k was trained with a similar architecture to CerebrasGPT with the addition of SwiGLU nonlinearity, ALiBi position embeddings, and maximal update parameterization (muP). The model was trained for 1 epoch of SlimPajama-627B. 75% of training was performed with 2k sequence length. The final 25% of training was performed at 8k sequence length to enable long sequence applications
https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/
https://huggingface.co/cerebras/btlm-3b-8k-base
https://arxiv.org/abs/2309.11568
Beta Was this translation helpful? Give feedback.
All reactions