This project implements a custom Transformer model from scratch and tests it on various datasets. We also compare its performance against PyTorch's nn.Transformer
.
- Custom implementation of the Transformer architecture.
- Positional encoding, multi-head attention, and feedforward layers implemented manually.
- Training and evaluation on multiple datasets:
- WikiText-2: A dataset for language modeling tasks.
- Multi30k (EN-DE): English to German translation.
- Multi30k (EN-FR): English to French translation.
- BLEU score and loss metrics for performance comparison.
We evaluate our implementation against PyTorch's nn.Transformer
. The comparison includes:
- Training time.
- Model accuracy (BLEU scores).
- Convergence behavior.
-
WikiText-2:
- Used for language modeling.
- Tokenized and preprocessed using
basic_english
tokenizer.
-
Multi30k:
- Two settings: English-to-German (EN-DE) and English-to-French (EN-FR).
- Preprocessed using
spacy
tokenizers for English, German, and French.
Results are evaluated based on BLEU scores and loss. Detailed analysis can be found in the logs and plots generated during training.
- Clone the repository:
git clone https://github.com/username/transformer-project.git
- Install dependencies: pip install -r requirements.txt
- Run the training script: python train.py
[Jonathan Wang, Conny Zhou]