This architecture combines the popular Transformer network with the proven convolution kernel. We investigate how this can improve performance and training speed in the application of Neural Machine Translation.
You can read more about the theory in documentation.pdf