Skip to content

Commit 7bdf6b4

Browse files
committed
Updated on 2024-11-06
1 parent 115456c commit 7bdf6b4

File tree

2 files changed

+21
-2
lines changed

2 files changed

+21
-2
lines changed

papers/list.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
11
[
2+
{
3+
"title": "Efficient Streaming Language Models with Attention Sinks",
4+
"author": "Guangxuan Xiao et al",
5+
"year": "2024",
6+
"topic": "llm, kv cache, attention",
7+
"venue": "ICLR",
8+
"description": "This paper introduces StreamingLLM, a framework that enables large language models to process infinitely long text sequences efficiently without fine-tuning, based on a key insight about \"attention sinks.\" The authors discover that LLMs allocate surprisingly high attention scores to initial tokens regardless of their semantic relevance, which they explain is due to the softmax operation requiring attention scores to sum to one - even when a token has no strong matches in context, the model must distribute attention somewhere, and initial tokens become natural \"sinks\" since they're visible to all subsequent tokens during autoregressive training. Building on this insight, StreamingLLM maintains just a few initial tokens (as attention sinks) along with a sliding window of recent tokens, achieving up to 22.2x speedup compared to baselines while maintaining performance on sequences up to 4 million tokens long. Additionally, they show that incorporating a dedicated learnable \"sink token\" during model pre-training can further improve streaming capabilities by providing an explicit token for collecting excess attention.",
9+
"link": "https://arxiv.org/pdf/2309.17453"
10+
},
211
{
312
"title": "MagicPIG: LSH Sampling for Efficient LLM Generation",
413
"author": "Zhuoming Chen et al",

papers_read.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
7575
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
7676
</p>
7777
<p id="paperCount">
78-
So far, we have read 155 papers. Let's keep it up!
78+
So far, we have read 156 papers. Let's keep it up!
7979
</p>
8080
<small id="searchCount">
81-
Your search returned 155 papers. Nice!
81+
Your search returned 156 papers. Nice!
8282
</small>
8383

8484
<div class="search-inputs">
@@ -105,6 +105,16 @@ <h1>Here's where I keep a list of papers I have read.</h1>
105105
</thead>
106106
<tbody>
107107

108+
<tr>
109+
<td>Efficient Streaming Language Models with Attention Sinks</td>
110+
<td>Guangxuan Xiao et al</td>
111+
<td>2024</td>
112+
<td>llm, kv cache, attention</td>
113+
<td>ICLR</td>
114+
<td>This paper introduces StreamingLLM, a framework that enables large language models to process infinitely long text sequences efficiently without fine-tuning, based on a key insight about &quot;attention sinks.&quot; The authors discover that LLMs allocate surprisingly high attention scores to initial tokens regardless of their semantic relevance, which they explain is due to the softmax operation requiring attention scores to sum to one - even when a token has no strong matches in context, the model must distribute attention somewhere, and initial tokens become natural &quot;sinks&quot; since they&#x27;re visible to all subsequent tokens during autoregressive training. Building on this insight, StreamingLLM maintains just a few initial tokens (as attention sinks) along with a sliding window of recent tokens, achieving up to 22.2x speedup compared to baselines while maintaining performance on sequences up to 4 million tokens long. Additionally, they show that incorporating a dedicated learnable &quot;sink token&quot; during model pre-training can further improve streaming capabilities by providing an explicit token for collecting excess attention.</td>
115+
<td><a href="https://arxiv.org/pdf/2309.17453" target="_blank">Link</a></td>
116+
</tr>
117+
108118
<tr>
109119
<td>MagicPIG: LSH Sampling for Efficient LLM Generation</td>
110120
<td>Zhuoming Chen et al</td>

0 commit comments

Comments
 (0)