Skip to content

nevernever69/Transformer-in-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Implementation from Scratch 🚀

A PyTorch implementation of the Transformer architecture as described in "Attention Is All You Need". This project includes a complete, modular implementation of the Transformer machine translation tasks from english to italian.

🌟 Features

  • Complete transformer architecture implementation
  • Modular design with separate encoder and decoder components
  • Multi-head attention mechanism
  • Support for custom tokenization
  • Training and inference scripts included
  • Translation example implementation

🛠️ Components

  • model.py: Core transformer architecture
  • train.py: Training loop and utilities
  • translate.py: Inference and translation script
  • dataset.py: Data loading and preprocessing
  • config.py: Configuration and hyperparameters

🚀 Quick Start

# Clone the repository
git clone https://github.com/nevernever69/Transformer-in-pytorch.git
cd Transformer-in-pytorch

# Install requirements
pip install -r requirements.txt

# Train the model
python train.py

# Translate a sentence
python translate.py

or

Load pre-trained weights

Create necessary directories

mkdir -p opus_books_weights

## Download pre-trained weights and tokenizer files
- will update the instruction here, when weights upload finishes

📋 Model Architecture

Transformer
├── Encoder (6 layers)
│   ├── Multi-Head Attention
│   ├── Feed Forward Network
│   └── Layer Normalization
└── Decoder (6 layers)
    ├── Masked Multi-Head Attention
    ├── Multi-Head Attention
    ├── Feed Forward Network
    └── Layer Normalization

Results

Training

Processing Epoch 00: 100% 3638/3638 [23:45<00:00,  2.55it/s, loss=6.048]
Processing Epoch 01: 100% 3638/3638 [23:47<00:00,  2.55it/s, loss=5.207]
Processing Epoch 02: 100% 3638/3638 [23:47<00:00,  2.55it/s, loss=4.183]

Machine translation

Using device: cpu
    SOURCE: I am not a very good a student.
 PREDICTED: Io non ho il  il  .

📚 Training Data

The model can be trained on any parallel corpus. The example implementation uses the Opus Books dataset from huggingface.

🤝 Contributing

Contributions are welcome! Feel free to submit pull requests or open issues for bugs and feature requests.

📝 License

MIT License - feel free to use this code for your own projects!

⭐️ Show Your Support

If you find this implementation helpful, give it a star! ⭐️

Special Thanks

  • Umar Jamil for his video on transformer from Scratch video
  • Campusx and CodeEmporium for helping me understand transformer

About

Transformer: "Attention Is All You Need" implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages