This repository provides an illustrative example of how to build a very simple single-headed transformer-based language model using PyTorch and SentencePiece. It follows a step-by-step approach:
- Tokenizing the data using BPE (Byte Pair Encoding).
- Embedding the tokens.
- Applying a single-head self-attention mechanism.
- Adding positional encoding and a small feed-forward network.
- Training on a tiny dataset from TinyStories.
- Generating new text from the trained model.
Start from the simplest
simple_transformer.py which has a SingleHeadSelfAttention
to
the next multihead_model_eval.py which demonstrates using the Multihead concept in a sequential manner using the same SingleHeadSelfAttention
block in above
multihead_attention = nn.ModuleList()
for _ in range(num_heads):
attention_mod = SingleHeadSelfAttention(d_model)
multihead_attention.append(attention_mod)
and the next multiheadattentionv2.py which vectorises the above via MultiHeadSelfAttention
class
and finally multiheadattentionv3.py which uses the proper torch.nn.MultiheadAttention
class
multihead_attention = nn.MultiheadAttention(d_model, num_heads, dropout=0.1,batch_first=True)
Dataset used is Tiny Stories hosted in HuggingFace. This is an interesting dataset in itself. This is from the paper TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (Ronen, Yuanzhi 2023) . It is specially designed for very small models like ours.
This is the result of the last one here trained on three epochs on a 6 GB GPU RAM NVIDIA GeForce RTX 3060 laptop GPU. Not bad for a single attention block using just the simplest generation (argmax - max probablity of the token in the vocabulary)
prompt = "Bloom lived in a big garden"
17-Feb-25 18:21:33 - Generated Text=Bloom lived in a big garden with many other animals. One day, Billy was playing with his friends, but he didn't want to share. He wanted to play with his friends, but he didn't want to share. He tried to take a nap, but he was too shy to share. He tried to take a nap, but he was too shy to take a nap. He was so sad and he didn't know what to do. He decided to take a nap.
Not bad I shoud say; More details are here in this Medium article but the code and especially the colab notebooks are self-explantory