You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: papers/list.json
+18Lines changed: 18 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,22 @@
1
1
[
2
+
{
3
+
"title": "Recurrent Drafter for Fast Speculative Decoding in Large Language Models",
4
+
"author": "Yunfei Cheng et al",
5
+
"year": "2024",
6
+
"topic": "speculative decoding, drafting, llm",
7
+
"venue": "Arxiv",
8
+
"description": "This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM's hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model's predictions with those of the LLM.",
9
+
"link": "https://arxiv.org/pdf/2403.09919"
10
+
},
11
+
{
12
+
"title": "QuIP: 2-Bit Quantization of Large Language Models With Guarantees",
13
+
"author": "Jerry Chee et al",
14
+
"year": "2024",
15
+
"topic": "quantization, block-wise",
16
+
"venue": "Arxiv",
17
+
"description": "QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. ",
18
+
"link": "https://arxiv.org/pdf/2307.13304"
19
+
},
2
20
{
3
21
"title": "BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference",
Copy file name to clipboardExpand all lines: papers_read.html
+22-2Lines changed: 22 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -16,10 +16,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
16
16
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
17
17
</p>
18
18
<pid="paperCount">
19
-
So far, we have read 203 papers. Let's keep it up!
19
+
So far, we have read 205 papers. Let's keep it up!
20
20
</p>
21
21
<smallid="searchCount">
22
-
Your search returned 203 papers. Nice!
22
+
Your search returned 205 papers. Nice!
23
23
</small>
24
24
25
25
<divclass="search-inputs">
@@ -46,6 +46,26 @@ <h1>Here's where I keep a list of papers I have read.</h1>
46
46
</thead>
47
47
<tbody>
48
48
49
+
<tr>
50
+
<td>Recurrent Drafter for Fast Speculative Decoding in Large Language Models</td>
51
+
<td>Yunfei Cheng et al</td>
52
+
<td>2024</td>
53
+
<td>speculative decoding, drafting, llm</td>
54
+
<td>Arxiv</td>
55
+
<td>This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM's hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model's predictions with those of the LLM.</td>
<td>QuIP: 2-Bit Quantization of Large Language Models With Guarantees</td>
61
+
<td>Jerry Chee et al</td>
62
+
<td>2024</td>
63
+
<td>quantization, block-wise</td>
64
+
<td>Arxiv</td>
65
+
<td>QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. </td>
0 commit comments