Efficient Streaming Language Models with Attention Sinks [ #3443

logikstate · 2023-10-02T18:53:59Z

logikstate
Oct 2, 2023

New paper with example code claims huge context with minimal changes

https://github.com/mit-han-lab/streaming-llm

KerfuffleV2 · 2023-10-02T20:27:52Z

KerfuffleV2
Oct 2, 2023
Collaborator

Hmm, either you misunderstood it or I did. https://github.com/mit-han-lab/streaming-llm#faq says that it doesn't actually increase context. As I understand it, this is basically #3377 - a more graceful way to do --keep rather than something that actually increases the context size.

3 replies

logikstate Oct 3, 2023
Author

Yes, the author updated and added an faq. It seems similar to sliding window.

logikstate Oct 3, 2023
Author

https://news.ycombinator.com/item?id=37740932

j7z Oct 18, 2023

For reference, Tom Aarsen also has an attention sinks project here: https://github.com/tomaarsen/attention_sinks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient Streaming Language Models with Attention Sinks [ #3443

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Efficient Streaming Language Models with Attention Sinks [ #3443

Uh oh!

logikstate Oct 2, 2023

Replies: 1 comment · 3 replies

Uh oh!

KerfuffleV2 Oct 2, 2023 Collaborator

Uh oh!

Uh oh!

logikstate Oct 3, 2023 Author

Uh oh!

logikstate Oct 3, 2023 Author

Uh oh!

Uh oh!

j7z Oct 18, 2023

logikstate
Oct 2, 2023

Replies: 1 comment 3 replies

KerfuffleV2
Oct 2, 2023
Collaborator

logikstate Oct 3, 2023
Author

logikstate Oct 3, 2023
Author