This project implements a neural machine translation (NMT) model using an encoder-decoder architecture with attention mechanisms. It translates Italian sentences to English using a dataset of bilingual sentence pairs.
- Implements a Seq2Seq model with LSTM-based encoder and decoder
- Explores three attention scoring methods: dot, general, and concatenation
- Uses GloVe embeddings for the English vocabulary
- Evaluates performance using BLEU scores and attention visualizations
- Source: ManyThings.org
- Language Pair: Italian ↔ English
- Format: Tab-separated sentence pairs
- Preprocessing:
- Sentence length capped at 20 tokens
- Special tokens
<start>
and<end>
added for decoder training
- Encoder:
- Embedding layer (50d)
- LSTM layer with 256 units
- Decoder:
- Embedding layer (100d)
- LSTM layer with 256 units
- Attention mechanism (dot, general, concat scoring)
- Output:
- Dense layer with softmax activation over vocabulary
Model Variant | BLEU Score |
---|---|
Encoder-Decoder (no attention) | 60.8 |
Encoder-Decoder + Attention (dot scoring) | 65.8 |
Encoder-Decoder + Attention (general scoring) | 66.6 |
Encoder-Decoder + Attention (concat scoring) | 66.8 |
- Attention mechanisms improved translation quality over baseline
- Concat scoring gave the best BLEU score and alignment maps
Input: anche voi riuscite a farlo
Predicted: you can do it too
Input: non riguarda noi
Predicted: it is not about us
(More examples and attention maps can be found in the notebook.)
- Key Packages:
- Python 3.10
- tensorflow==2.19.0
- tf-keras==2.19.0
- keras==3.9.2
- pandas=1.4.2
- numpy=1.26.4
- matplotlib=3.10.0
- regex=2024.11.6
- nltk=3.9.1
- scikit-learn=1.1.1