-
Notifications
You must be signed in to change notification settings - Fork 19.6k
MultiHeadAttention's use_causal_mask is broken #21284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @pfekin, thanks for reporting this. |
I'm sorry for the size - given the nature of the problem it's hard to make the script shorter.
|
You need to install the necessary libraries beforehand: Also, you might need to set HF_TOKEN (Hugginface API environment variable) as a Colab secret. I you want you can replace prepare_wikitext with:
It will replace the Wikitetext-2 dataset (hosted on HuggingFace) with the IMDB dataset (hosted on Tensorflow and does not require an API key), but it's larger and it will take longer to return results. On Wikitext-2 I'm getting val_accuracy: 0.6750 after 10 epochs on a randomly generated validation dataset. |
There is leakage of forward embeddings when not calling the MultiHeadAttention layer with a mask and using use_causal_mask=True instead.
I get +0.99 accuracy on a randomly generated validation dataset using Colab.
The text was updated successfully, but these errors were encountered: