Updated on 2024-08-25

lxaw · lxaw · commit 5c958cbec9a8 · 2024-08-25T04:51:58.000-04:00
diff --git a/index.html b/index.html
@@ -39,7 +39,7 @@ <h3>
         When?
     </h3>
     <p>
-        Last time this was edited was 2024-08-23 (YYYY/MM/DD).
+        Last time this was edited was 2024-08-25 (YYYY/MM/DD).
     </p>
     <small><a href="misc.html">misc</a></small>
 </body>
diff --git a/papers/list.json b/papers/list.json
@@ -1,11 +1,20 @@
 [
+  {
+    "title": "LoRA: Low-Rank Adaptation of Large Language Models",
+    "author": "Edward Hu et al",
+    "year": "2021",
+    "topic": "low rank adaptation, lora, llm, fine-tuning",
+    "venue": "Arxiv",
+    "description": "Fine-tuning large models is expensive, because we update all the original parameters. LoRA, taking inspiration from Aghajanyan et al, 2020 (pre-trained language models have a low \"intrinsic dimension\"), the authors thought that the weight updates would also have low intrinsic rank. Thus, they decompose Delta W = BA, where B and A are lower rank. The A and B are trainable. They initialize A with Gaussian, and B as zero, so Delta W = BA is zero initialy. They then optimize and find this method to be more efficient in terms of both time and space.",
+    "link": "https://arxiv.org/pdf/2106.09685"
+  },
   {
     "title": "Learning to Compress Prompts with Gist Tokens",
     "author": "Jesse Mu et al",
     "year": "2023",
     "topic": "llms, prompting, compression, tokens",
     "venue": "NeurIPS",
-    "description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques...",
+    "description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques.",
     "link": "https://arxiv.org/pdf/2304.08467"
   },
   {