Updated on 2024-08-23

lxaw · lxaw · commit a453a2890431 · 2024-08-23T10:36:37.000-04:00
diff --git a/papers/list.json b/papers/list.json
@@ -1,4 +1,13 @@
 [
+  {
+    "title": "Learning to Compress Prompts with Gist Tokens",
+    "author": "Jesse Mu et al",
+    "year": "2023",
+    "topic": "llms, prompting, compression, tokens",
+    "venue": "NeurIPS",
+    "description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques...",
+    "link": "https://arxiv.org/pdf/2304.08467"
+  },
   {
     "title": "Once-For-All: Train One Network and Specialize it For Efficient Deployment",
     "author": "Han Cai et al",

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,13 @@`
`1`	`1`	`[`
	`2`	`+ {`
	`3`	`+ "title": "Learning to Compress Prompts with Gist Tokens",`
	`4`	`+ "author": "Jesse Mu et al",`
	`5`	`+ "year": "2023",`
	`6`	`+ "topic": "llms, prompting, compression, tokens",`
	`7`	`+ "venue": "NeurIPS",`
	`8`	+ "description": "The authors describe a method of using a distilling function G (similar to a hypernet) that is able to compress LM prompts into a smaller set of \"gist\" tokens. These tokens can then be cached and reused. The neat trick is that they reuse the LM itself as G, so gisting itself incurs no additional training cost. Note that in their \"Failure Cases\" section, they mention \"... While it is unclear why only the gist models exhibit this behavior (i.e. the fail example behavior), these issues can likely be mitigated with more careful sampling techniques...",
	`9`	`+ "link": "https://arxiv.org/pdf/2304.08467"`
	`10`	`+ },`
`2`	`11`	`{`
`3`	`12`	`"title": "Once-For-All: Train One Network and Specialize it For Efficient Deployment",`
`4`	`13`	`"author": "Han Cai et al",`