Cursed Chatbot – Decoder-Only Transformer for Text Generation

This is a from-scratch implementation of a decoder-only transformer model for generating answers to short questions.

Decoder-only transformer model for causal language modeling
Masked self-attention layer for causal attention
Only 36M parameters
BPE Tokenizer trained from scratch with a vocabulary size of 20k
Trained on a subset of the GooAQ dataset with ~850k question-answer pairs (no pre-training)
Supports greedy and top-p (nucleus) sampling at inference time
Super basic chatbot interface for interacting with the model based on streamlit

Quickstart

Use the notebook gpu_training_colab_notebook.ipynb that can be used to train the model on Google Colab.
Download the model checkpoint and tokenizer JSON file and put them in the temp directory.
Run streamlit run chatbot.py to start the chatbot interface.
Get your questions answered by the wackiest chatbot you've ever seen!

Data Format

The sequence format is as follows:

q1 q2 ... qN [SEP] a1 a2 ... aM [END] [PAD] ... [PAD]

with special tokens [SEP] for separating questions and answers, [END] for marking the end of the answer, and [PAD] for padding. The token [UNK] is used for out-of-vocabulary words.

The dataset is a subset of the GooAQ dataset.

Example questions and answers:

Q: is it possible to get a false negative flu test?
A: This variation in ability to detect viruses can result in some people who are infected with the flu having a negative rapid test result. (This situation is called a false negative test result.)

Q: are you not supposed to rinse after brushing teeth?
A: Don't rinse with water straight after toothbrushing Don't rinse your mouth immediately after brushing, as it'll wash away the concentrated fluoride in the remaining toothpaste. This dilutes it and reduces its preventative effects.

Learn More

Going Further

Here are some ideas for extending the project:

Pre-training: Pre-train the model on a large corpus of text data to improve performance.
Fine-tuning: Fine-tune the model on the question-answering task prioritizing answer-generation.
Hyperparameter tuning: Experiment with different hyperparameters to improve performance.
Scaling up: Train a larger model with more parameters and a larger dataset.

Learning Resources

Some websites and videos that are helpful for understanding transformers and self-attention:

I made this project for the course "Deep Learning (INF265)" at the University of Bergen (UiB) in the spring of 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
figs		figs
.gitignore		.gitignore
README.md		README.md
chatbot.py		chatbot.py
config.py		config.py
dataset.py		dataset.py
gpu_training_colab_notebook.ipynb		gpu_training_colab_notebook.ipynb
inference.py		inference.py
model.py		model.py
tokenizer.py		tokenizer.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cursed Chatbot – Decoder-Only Transformer for Text Generation

Quickstart

Data Format

Learn More

Going Further

Learning Resources

About

Uh oh!

Releases

Packages

Languages

odinhg/decoder-answer-bot

Folders and files

Latest commit

History

Repository files navigation

Cursed Chatbot – Decoder-Only Transformer for Text Generation

Quickstart

Data Format

Learn More

Going Further

Learning Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages