This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Description
I am trying to extract a words embedding of the various tokenized (.tok) files. I have preprocessed the various dataset using preprocessing pipeline suggested in the TransCoder. I have also trained the model and can also used pretrained (TransCoder) to extract embedding matrix and embedding vectors of various tokens of various tokenized file.
Authors have plotted t-SNE visualization of a cross-lingual token embeddings. They obtained by encoding programming language tokens into TransCoder's lookup table.
Can authors explain how you did that? I also want to extract embedding of these tokens.