JuniorGPT is a light implementation of the GPT (Generative Pre-trained Transformer) model focused on generating text in the style of Shakespeare. This repository contains code to train a model using a subset of Shakespeare's works and subsequently generate text resembling the bard's style.
JuniorGPT is designed to:
- Load Shakespeare text data.
- Tokenize the data and create a vocabulary.
- Define and initialize a GPT-like architecture.
- Train the model.
- Generate Shakespearean-style text.
- Python 3.x
- PyTorch (latest version)
- CUDA (if using GPU for training)
- Place your Shakespeare dataset named input.txt in the root directory.
- Run the script. The script will train the model and generate text samples.
- Find the generated Shakespearean-style text in the output.txt file.
- Initial Setup: Hyperparameters, GPU check, manual seed, and other configuration details.
- Data Preparation:
- Loading the Shakespeare dataset.
- Tokenizing the text and creating a vocabulary for unique characters.
- Splitting the data into training and validation sets.
- Model Architecture:
- Definition of various sub-modules (like MultiHeadAttention, FeedFoward, and Block) which will be used in the main GPT model.
- Main GPT model (called GPTLanguageModel) defined using the sub-modules.
- Training Loop:
- The model is trained using the AdamW optimizer.
- Loss is estimated every eval_interval steps.
- Text Generation:
- A context tensor is initialized with zeros.
- The model generates text in the style of Shakespeare. This text is saved to output.txt.
At the end of the training, you will see a sample output printed in the console. Additionally, the generated text will be saved in output.txt in the root directory.