Replies: 2 comments 1 reply
-
It's used in chunk aware transformers. cc @VahidooX |
Beta Was this translation helpful? Give feedback.
0 replies
-
The caching is being used during the inference when cache-aware streaming Conformer is being used. During the training, it is skipped. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently working with the MultiHeadAttention class and found the update_cache function. As far as I understand it does nothing at this moment and is a template for the future, am I right? and if it's true, can you explain what this function will do?
https://github.com/NVIDIA/NeMo/blob/9f94649b9111b7d20fb3770b76ccccc2a2633b1f/nemo/collections/asr/parts/submodules/multi_head_attention.py#L154-L165
Beta Was this translation helpful? Give feedback.
All reactions