What does the update_cache function in MultiHeadAttention do? #5464

fedorovgv · 2022-11-20T14:00:10Z

fedorovgv
Nov 20, 2022

I am currently working with the MultiHeadAttention class and found the update_cache function. As far as I understand it does nothing at this moment and is a template for the future, am I right? and if it's true, can you explain what this function will do?

https://github.com/NVIDIA/NeMo/blob/9f94649b9111b7d20fb3770b76ccccc2a2633b1f/nemo/collections/asr/parts/submodules/multi_head_attention.py#L154-L165

titu1994 · 2022-11-20T22:15:33Z

titu1994
Nov 20, 2022
Maintainer

It's used in chunk aware transformers. cc @VahidooX

0 replies

VahidooX · 2022-11-23T07:55:42Z

VahidooX
Nov 23, 2022
Collaborator

The caching is being used during the inference when cache-aware streaming Conformer is being used. During the training, it is skipped.

1 reply

fedorovgv Nov 23, 2022
Author

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What does the update_cache function in MultiHeadAttention do? #5464

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What does the update_cache function in MultiHeadAttention do? #5464

Uh oh!

fedorovgv Nov 20, 2022

Replies: 2 comments · 1 reply

Uh oh!

titu1994 Nov 20, 2022 Maintainer

Uh oh!

VahidooX Nov 23, 2022 Collaborator

Uh oh!

fedorovgv Nov 23, 2022 Author

fedorovgv
Nov 20, 2022

Replies: 2 comments 1 reply

titu1994
Nov 20, 2022
Maintainer

VahidooX
Nov 23, 2022
Collaborator

fedorovgv Nov 23, 2022
Author