What is the benefit to store hidden states and attention weights? #101

waltonfuture · 2025-07-16T07:34:02Z

waltonfuture
Jul 16, 2025

Thanks for your project! After reading your paper, I find that MemOS can store KV cache, hidden states and attention weights. I get that KV cache can be used to reload the conversation history. However, I cannot understand the usage of storing hidden states and attention weights. Can you help explain that? Thanks!

Answered by MarrytheToilet

Jul 16, 2025

Thank you for your question.

In our classification of memory types, we divide them into three categories: plaintext memory, activation memory, and parameter memory. Among these, activation memory can include forms such as KV-caches, hidden states, and attention weights.

In the current version of MemOS, KV-cache is the primary form of activation memory used. This is due to its stable performance, interpretability, and the maturity of research enabling its effective and fast integration. Other forms of activation memory, such as hidden states and attention weights, are still under active research and are not yet mature enough for stable engineering use.

That said, recent studies have demons…

View full answer

Ki-Seki · 2025-07-16T07:57:49Z

Ki-Seki
Jul 16, 2025
Maintainer

@MarrytheToilet, please try to answer this user's question.

0 replies

MarrytheToilet · 2025-07-16T08:56:11Z

MarrytheToilet
Jul 16, 2025

Thank you for your question.

In our classification of memory types, we divide them into three categories: plaintext memory, activation memory, and parameter memory. Among these, activation memory can include forms such as KV-caches, hidden states, and attention weights.

In the current version of MemOS, KV-cache is the primary form of activation memory used. This is due to its stable performance, interpretability, and the maturity of research enabling its effective and fast integration. Other forms of activation memory, such as hidden states and attention weights, are still under active research and are not yet mature enough for stable engineering use.

That said, recent studies have demonstrated that hidden states can serve as carriers of memory. For example, by injecting steering vectors into specific layers during inference, a model can be guided to generate content in a particular style—an approach often understood as controlling abstract or stylistic memory forms [1][2].
Similarly, attention weights have been explored as a way to enhance working memory by improving the focus on relevant information [3][4].

However, both hidden states and attention weights are not yet ready for large-scale deployment, which is why the current development of MemOS focuses primarily on KV-cache-based acceleration of inference and compatibility with frameworks like Hugging Face’s transformers and vllm.

References

[1] Panickssery, N., Gabrieli, N., Schulz, J., Tong, M., Hubinger, E., & Turner, A. M. (2024). Steering Llama 2 via Contrastive Activation Addition. arXiv:2312.06681. https://arxiv.org/abs/2312.06681
[2] Xu, Z., Wang, S., Xu, K., Xu, H., Wang, M., Deng, X., Yao, Y., Zheng, G., Chen, H., & Zhang, N. (2025). EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models. arXiv:2504.15133. https://arxiv.org/abs/2504.15133
[3] Ye, T., Dong, L., Xia, Y., Sun, Y., Zhu, Y., Huang, G., & Wei, F. (2025). Differential Transformer. arXiv:2410.05258. https://arxiv.org/abs/2410.05258
[4] Xiao, G., Tian, Y., Chen, B., Han, S., & Lewis, M. (2024). Efficient Streaming Language Models with Attention Sinks. arXiv:2309.17453. https://arxiv.org/abs/2309.17453

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the benefit to store hidden states and attention weights? #101

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What is the benefit to store hidden states and attention weights? #101

Uh oh!

waltonfuture Jul 16, 2025

Replies: 2 comments

Uh oh!

Ki-Seki Jul 16, 2025 Maintainer

Uh oh!

MarrytheToilet Jul 16, 2025

References

waltonfuture
Jul 16, 2025

Ki-Seki
Jul 16, 2025
Maintainer

MarrytheToilet
Jul 16, 2025