We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,非常感谢您的工作! 想请问 1.为什么temp_attention_mask要弄成bs,1,1,seq的形状。为什么bs,seq的形状不行。 2.attention_mask, 按照代码的意思是处mask为-10000,non-pad处mask为0,请问我的理解正确嘛。 请问为什么要这么做呢
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
您好,非常感谢您的工作!
想请问
1.为什么temp_attention_mask要弄成bs,1,1,seq的形状。为什么bs,seq的形状不行。
2.attention_mask, 按照代码的意思是处mask为-10000,non-pad处mask为0,请问我的理解正确嘛。
请问为什么要这么做呢
The text was updated successfully, but these errors were encountered: