Implementation of the Transformer architecture

This is an implementation of the transformer architecture from scratch. I use pytorch to be able to run on the gpu but the transformer logic including the self attention and multhead attention is implementet from scratch.

If you want to run and modify this you only need to install pytorch, nothing else.

Example usage:

# args: (batch_size, embed_size, seq_len, num_heads, forward_expand, drop_out_prob, max_seq_len)
model = Encoder(1000, 256, 10, 8, 4, 0.1, 50)
input = torch.randint(0, 1, (1000, 10))
output = model(input)
# input shape: torch.Size([1000, 10])
# output shape: torch.Size([1000, 10, 256])
# NOTE: you would typically only use the last embedding
# which would make the output shape: [1000, 256]

At this time I only implemented the encoder part of the transformer I will when I get time also implement the decoder part of the architecture, which will not require much more effort since the building blocks are in place.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Implementation of the Transformer architecture

Example usage:

About

Uh oh!

Releases

Packages

Languages

MoritzSchwerer/TransformerFromScratch

Folders and files

Latest commit

History

Repository files navigation

Implementation of the Transformer architecture

Example usage:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages