Skip to content

Commit 488a3b7

Browse files
committed
Updated on 2025-01-11
1 parent 20ba793 commit 488a3b7

File tree

3 files changed

+41
-3
lines changed

3 files changed

+41
-3
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ <h1>Where?</h1>
6464
</section>
6565
<section>
6666
<h1>When?</h1>
67-
Last time this was edited was 2025-01-09 (YYYY/MM/DD).
67+
Last time this was edited was 2025-01-11 (YYYY/MM/DD).
6868
</section>
6969
<footer>
7070
<small><a href="misc.html">misc</a></small>

papers/list.json

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,22 @@
11
[
2+
{
3+
"title": "Recurrent Drafter for Fast Speculative Decoding in Large Language Models",
4+
"author": "Yunfei Cheng et al",
5+
"year": "2024",
6+
"topic": "speculative decoding, drafting, llm",
7+
"venue": "Arxiv",
8+
"description": "This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM's hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model's predictions with those of the LLM.",
9+
"link": "https://arxiv.org/pdf/2403.09919"
10+
},
11+
{
12+
"title": "QuIP: 2-Bit Quantization of Large Language Models With Guarantees",
13+
"author": "Jerry Chee et al",
14+
"year": "2024",
15+
"topic": "quantization, block-wise",
16+
"venue": "Arxiv",
17+
"description": "QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. ",
18+
"link": "https://arxiv.org/pdf/2307.13304"
19+
},
220
{
321
"title": "BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference",
422
"author": "Wonsuk Jang et al",

papers_read.html

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
1616
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
1717
</p>
1818
<p id="paperCount">
19-
So far, we have read 203 papers. Let's keep it up!
19+
So far, we have read 205 papers. Let's keep it up!
2020
</p>
2121
<small id="searchCount">
22-
Your search returned 203 papers. Nice!
22+
Your search returned 205 papers. Nice!
2323
</small>
2424

2525
<div class="search-inputs">
@@ -46,6 +46,26 @@ <h1>Here's where I keep a list of papers I have read.</h1>
4646
</thead>
4747
<tbody>
4848

49+
<tr>
50+
<td>Recurrent Drafter for Fast Speculative Decoding in Large Language Models</td>
51+
<td>Yunfei Cheng et al</td>
52+
<td>2024</td>
53+
<td>speculative decoding, drafting, llm</td>
54+
<td>Arxiv</td>
55+
<td>This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM&#x27;s hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model&#x27;s predictions with those of the LLM.</td>
56+
<td><a href="https://arxiv.org/pdf/2403.09919" target="_blank">Link</a></td>
57+
</tr>
58+
59+
<tr>
60+
<td>QuIP: 2-Bit Quantization of Large Language Models With Guarantees</td>
61+
<td>Jerry Chee et al</td>
62+
<td>2024</td>
63+
<td>quantization, block-wise</td>
64+
<td>Arxiv</td>
65+
<td>QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. </td>
66+
<td><a href="https://arxiv.org/pdf/2307.13304" target="_blank">Link</a></td>
67+
</tr>
68+
4969
<tr>
5070
<td>BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference</td>
5171
<td>Wonsuk Jang et al</td>

0 commit comments

Comments
 (0)