A PyTorch implementation of the Transformer architecture as described in "Attention Is All You Need". This project includes a complete, modular implementation of the Transformer machine translation tasks from english to italian.
- Complete transformer architecture implementation
- Modular design with separate encoder and decoder components
- Multi-head attention mechanism
- Support for custom tokenization
- Training and inference scripts included
- Translation example implementation
model.py
: Core transformer architecturetrain.py
: Training loop and utilitiestranslate.py
: Inference and translation scriptdataset.py
: Data loading and preprocessingconfig.py
: Configuration and hyperparameters
# Clone the repository
git clone https://github.com/nevernever69/Transformer-in-pytorch.git
cd Transformer-in-pytorch
# Install requirements
pip install -r requirements.txt
# Train the model
python train.py
# Translate a sentence
python translate.py
or
mkdir -p opus_books_weights
## Download pre-trained weights and tokenizer files
- will update the instruction here, when weights upload finishes
Transformer
├── Encoder (6 layers)
│ ├── Multi-Head Attention
│ ├── Feed Forward Network
│ └── Layer Normalization
└── Decoder (6 layers)
├── Masked Multi-Head Attention
├── Multi-Head Attention
├── Feed Forward Network
└── Layer Normalization
Training
Processing Epoch 00: 100% 3638/3638 [23:45<00:00, 2.55it/s, loss=6.048]
Processing Epoch 01: 100% 3638/3638 [23:47<00:00, 2.55it/s, loss=5.207]
Processing Epoch 02: 100% 3638/3638 [23:47<00:00, 2.55it/s, loss=4.183]
Machine translation
Using device: cpu
SOURCE: I am not a very good a student.
PREDICTED: Io non ho il il . ⏎
The model can be trained on any parallel corpus. The example implementation uses the Opus Books dataset from huggingface.
Contributions are welcome! Feel free to submit pull requests or open issues for bugs and feature requests.
MIT License - feel free to use this code for your own projects!
If you find this implementation helpful, give it a star! ⭐️
- Umar Jamil for his video on transformer from Scratch video
- Campusx and CodeEmporium for helping me understand transformer