GitHub

Building and training a bigram transformer nueral network. This is a decoder-only model, that does not explicitly answer questions, rather it autocompletes Shakespeare. This is because I skipped out on the finetuning stages after the model was trained. You can see the dataset I used to train the model in this repository as well.

Model Parameters - 10.78 Million

Model Hyperparameters -

batch_size = 64 # how many independent sequences will we process in parallel?

block_size = 256 # what is the maximum context length for predictions?

max_iters = 5000

eval_interval = 500

learning_rate = 3e-4

eval_iters = 200

n_embd = 384

n_head = 6

n_layer = 6

dropout = 0.2

This is the amazing video I used to learn about how to actually implement a generative pre-trained transformer. https://www.youtube.com/watch?v=kCc8FmEb1nY

The research paper that started everything: https://arxiv.org/pdf/1706.03762

Training: I trained this model on a powerful GPU. Nvidia Tesla-T4, allowing me to train this model in about 30 minutes.

To Do: Train this model on a different corpus of text data!

A diagram of the decoder only architecture from Transformers:

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Building_GPT		Building_GPT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model Parameters - 10.78 Million

Model Hyperparameters -

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pseelam02/GPT

Folders and files

Latest commit

History

Repository files navigation

Model Parameters - 10.78 Million

Model Hyperparameters -

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages