Updated on 2025-01-11

lxaw · lxaw · commit 488a3b7fb9a9 · 2025-01-11T16:52:21.000-05:00
diff --git a/index.html b/index.html
@@ -64,7 +64,7 @@ <h1>Where?</h1>
         </section>
         <section>
             <h1>When?</h1>
-        Last time this was edited was 2025-01-09 (YYYY/MM/DD).
+        Last time this was edited was 2025-01-11 (YYYY/MM/DD).
         </section>
         <footer>
             <small><a href="misc.html">misc</a></small>
diff --git a/papers/list.json b/papers/list.json
@@ -1,4 +1,22 @@
 [
+  {
+    "title": "Recurrent Drafter for Fast Speculative Decoding in Large Language Models",
+    "author": "Yunfei Cheng et al",
+    "year": "2024",
+    "topic": "speculative decoding, drafting, llm",
+    "venue": "Arxiv",
+    "description": "This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM's hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model's predictions with those of the LLM.",
+    "link": "https://arxiv.org/pdf/2403.09919"
+  },
+  {
+    "title": "QuIP: 2-Bit Quantization of Large Language Models With Guarantees",
+    "author": "Jerry Chee et al",
+    "year": "2024",
+    "topic": "quantization, block-wise",
+    "venue": "Arxiv",
+    "description": "QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. ",
+    "link": "https://arxiv.org/pdf/2307.13304"
+  },
   {
     "title": "BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference",
     "author": "Wonsuk Jang et al",
diff --git a/papers_read.html b/papers_read.html
@@ -16,10 +16,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
         I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
     </p>
     <p id="paperCount">
-        So far, we have read 203 papers. Let's keep it up!
+        So far, we have read 205 papers. Let's keep it up!
     </p> 
     <small id="searchCount">
-        Your search returned 203 papers. Nice! 
+        Your search returned 205 papers. Nice! 
     </small>
     
     <div class="search-inputs">
@@ -46,6 +46,26 @@ <h1>Here's where I keep a list of papers I have read.</h1>
         </thead>
         <tbody>
         
+            <tr>
+                <td>Recurrent Drafter for Fast Speculative Decoding in Large Language Models</td>
+                <td>Yunfei Cheng et al</td>
+                <td>2024</td>
+                <td>speculative decoding, drafting, llm</td>
+                <td>Arxiv</td>
+                <td>This paper indroduces ReDrafter (Recurrent Drafter) that uses an RNN as the draft model and conditions on the LLM&#x27;s hidden states. They use a beam search to explore the candidate seqeunces and then apply a dynamic tree attention alg to remove duplicated prefixes among the candidates to improve the speedup. They also train via knowledge distillation from LLMs to improve the alignment of the draft model&#x27;s predictions with those of the LLM.</td>
+                <td><a href="https://arxiv.org/pdf/2403.09919" target="_blank">Link</a></td>
+            </tr>
+        
+            <tr>
+                <td>QuIP: 2-Bit Quantization of Large Language Models With Guarantees</td>
+                <td>Jerry Chee et al</td>
+                <td>2024</td>
+                <td>quantization, block-wise</td>
+                <td>Arxiv</td>
+                <td>QuIP (quantization with incoherence processing) is a method based on the insight that quantization benefits from incoherent weight and Hessian mats, meaning that they benefit from the weights being even in magnitude and benefit from having the directions in whcih they are rounded to being unaligned with the coordinate axes. </td>
+                <td><a href="https://arxiv.org/pdf/2307.13304" target="_blank">Link</a></td>
+            </tr>
+        
             <tr>
                 <td>BlockDialect: Block-wise Fine-grained Mixed Format for Energy-Efficient LLM Inference</td>
                 <td>Wonsuk Jang et al</td>