Skip to content

odinhg/decoder-answer-bot

Repository files navigation

Cursed Chatbot – Decoder-Only Transformer for Text Generation

screenshot of chatbot interface

This is a from-scratch implementation of a decoder-only transformer model for generating answers to short questions.

  • Decoder-only transformer model for causal language modeling
  • Masked self-attention layer for causal attention
  • Only 36M parameters
  • BPE Tokenizer trained from scratch with a vocabulary size of 20k
  • Trained on a subset of the GooAQ dataset with ~850k question-answer pairs (no pre-training)
  • Supports greedy and top-p (nucleus) sampling at inference time
  • Super basic chatbot interface for interacting with the model based on streamlit

Quickstart

  1. Use the notebook gpu_training_colab_notebook.ipynb that can be used to train the model on Google Colab.
  2. Download the model checkpoint and tokenizer JSON file and put them in the temp directory.
  3. Run streamlit run chatbot.py to start the chatbot interface.
  4. Get your questions answered by the wackiest chatbot you've ever seen!

Data Format

The sequence format is as follows:

q1 q2 ... qN [SEP] a1 a2 ... aM [END] [PAD] ... [PAD]

with special tokens [SEP] for separating questions and answers, [END] for marking the end of the answer, and [PAD] for padding. The token [UNK] is used for out-of-vocabulary words.

The dataset is a subset of the GooAQ dataset.

Example questions and answers:

Q: is it possible to get a false negative flu test?
A: This variation in ability to detect viruses can result in some people who are infected with the flu having a negative rapid test result. (This situation is called a false negative test result.)
Q: are you not supposed to rinse after brushing teeth?
A: Don't rinse with water straight after toothbrushing Don't rinse your mouth immediately after brushing, as it'll wash away the concentrated fluoride in the remaining toothpaste. This dilutes it and reduces its preventative effects.

Learn More

Going Further

Here are some ideas for extending the project:

  • Pre-training: Pre-train the model on a large corpus of text data to improve performance.
  • Fine-tuning: Fine-tune the model on the question-answering task prioritizing answer-generation.
  • Hyperparameter tuning: Experiment with different hyperparameters to improve performance.
  • Scaling up: Train a larger model with more parameters and a larger dataset.

Learning Resources

Some websites and videos that are helpful for understanding transformers and self-attention:


I made this project for the course "Deep Learning (INF265)" at the University of Bergen (UiB) in the spring of 2025.

About

Decoder-only transfomer model for answering short questions using causal self-attention.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published