This is my first Large Language Model. The aim of this project is to train a model with first ~180m parameter model and if that succeeds a model with ~1-3b parameters. Currently i'm using the OpenWebText dataset.
My Goal is to make this project very readable and educational with much flexibility.