Transformer based translator model

A straightforward implementation of a Transformer model for the translation task, trained on Portuguese to English translation.

Install ray on Apple Silicon: https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support

This README only contain how-tos and project setup instructions. Later I intent to write a blog post on the transformer arch and learnings from this project.

Model

The ~250k tuples are tokenized in a BERT-style tokenizer with reduced vocab sizes of 28576 (eng.) and 30522 (por.) The ensemble encoder-decoder Transformer uses default parameters: 8 self-attention heads, 6 enc/dec layers and 512-d embedding space.

Scaling training

Ray is a framework that enables distributed training of the model on a Ray cluster. I used the utility CLI command to deploy the auto-scalable cluster on AWS. The config is at aws-cluster.yml.

ray up aws-cluster.yml

It takes a few minutes for the head to be available. Ray will spin a EC2 head node. When a job is dispatched this node will do the provisioning of workers nodes and setup the environment dependencies. For my local setup I simply attached the head node with:

ray attach -p 8265 aws-cluster.yml

The port forwarding allows me to access the cluster dashboard (http://localhost:8265) and use this address to submit a new job. The submission step could be done with CLI, but its more convenient to do so via Python script (submit_train.py)

Run ray down aws-cluster.yml to shutdown the cluster. Some residual resources may still be up, make sure to remove them.

Train and infer

Use train_model.py and the infer_from_artifact.ipynb for non-distributed training and inference testing.

Samples and observations

With very few training what this simple model was able to learn?

Surprisingly, it captures the grammatical structure quite well. Some instances:

eles conseguiram ir ao evento? -> did they get to the event?

eu não iria à festa se eu fosse ele -> i wouldn't go to the party if i were him.

a casa do rei é um... -> the king's house is a...

Alignment

Languages may have different rules on how words are ordered, nouns and adjectives are example. Neural translation needs spatial context in order to learn those rules (like RNN model), attention models are better at learning them given the long-range retrieval and multi head approach:

o pequeno gato cinza -> the little gray cat.

In some cases it deviates from the original meaning:

a grande casa amarela -> the big house is yellow. ❌

Nonsenses

Generally creating wrong translations.

você que inventou de inventar -> you've made fun of protein.

papel e caneta na mão -> paper is a pen in the hand.

A Europa é o principal destino turístico do mundo -> europe is the main tourist tourist tourist of the world.

This happens more often with complex or long sentences. Certainly the small dataset (250k) constraints the model vocabulary and generalization; and the decision to train on small sentences only. More training steps surely could improve some deficits, but the overall objective of the model were achieved (validating the implementation per se)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
translator		translator
.gitignore		.gitignore
README.md		README.md
aws-cluster.yml		aws-cluster.yml
infer_from_artifact.ipynb		infer_from_artifact.ipynb
mask_and_pe.ipynb		mask_and_pe.ipynb
model_card.md		model_card.md
poetry.lock		poetry.lock
prepare_data.ipynb		prepare_data.ipynb
pyproject.toml		pyproject.toml
submit_train.py		submit_train.py
train_distr.py		train_distr.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transformer based translator model

Model

Scaling training

Train and infer

Samples and observations

Alignment

Nonsenses

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mrmorais/translator-transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer based translator model

Model

Scaling training

Train and infer

Samples and observations

Alignment

Nonsenses

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages