Skip to content

Commit 5c958cb

Browse files
committed
Updated on 2024-08-25
1 parent a453a28 commit 5c958cb

File tree

2 files changed

+11
-2
lines changed

2 files changed

+11
-2
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ <h3>
3939
When?
4040
</h3>
4141
<p>
42-
Last time this was edited was 2024-08-23 (YYYY/MM/DD).
42+
Last time this was edited was 2024-08-25 (YYYY/MM/DD).
4343
</p>
4444
<small><a href="misc.html">misc</a></small>
4545
</body>

papers/list.json

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
11
[
2+
{
3+
"title": "LoRA: Low-Rank Adaptation of Large Language Models",
4+
"author": "Edward Hu et al",
5+
"year": "2021",
6+
"topic": "low rank adaptation, lora, llm, fine-tuning",
7+
"venue": "Arxiv",
8+
"description": "Fine-tuning large models is expensive, because we update all the original parameters. LoRA, taking inspiration from Aghajanyan et al, 2020 (pre-trained language models have a low \"intrinsic dimension\"), the authors thought that the weight updates would also have low intrinsic rank. Thus, they decompose Delta W = BA, where B and A are lower rank. The A and B are trainable. They initialize A with Gaussian, and B as zero, so Delta W = BA is zero initialy. They then optimize and find this method to be more efficient in terms of both time and space.",
9+
"link": "https://arxiv.org/pdf/2106.09685"
10+
},
211
{
312
"title": "Learning to Compress Prompts with Gist Tokens",
413
"author": "Jesse Mu et al",
514
"year": "2023",
615
"topic": "llms, prompting, compression, tokens",
716
"venue": "NeurIPS",
8-
"description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques...",
17+
"description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques.",
918
"link": "https://arxiv.org/pdf/2304.08467"
1019
},
1120
{

0 commit comments

Comments
 (0)