Updated on 2024-08-26

lxaw · lxaw · commit a88941cab720 · 2024-08-26T06:22:15.000-04:00
diff --git a/index.html b/index.html
@@ -39,7 +39,7 @@ <h3>
         When?
     </h3>
     <p>
-        Last time this was edited was 2024-08-25 (YYYY/MM/DD).
+        Last time this was edited was 2024-08-26 (YYYY/MM/DD).
     </p>
     <small><a href="misc.html">misc</a></small>
 </body>
diff --git a/papers/list.json b/papers/list.json
@@ -1,4 +1,13 @@
 [
+  {
+    "title": "Distilling the Knowledge in a Neural Network",
+    "author": "Geoffry Hinton et al",
+    "year": "2015",
+    "topic": "distillation, ensemble, MoE",
+    "venue": "Arxiv",
+    "description": "The first proposal of knowledge distillation. The main interesting point I found was that they change the temperature of the softmax to be higher to allow for softer targets. This allows for understanding what 2's look like 3's (in an MNIST example). Basically, adds a sort of regularization since more information can be carried in these softer targets compared to a single 0 or 1. They also propose the idea of having an ensemble of models, and then learning a distilled model that is smaller. The biological example of having a clumsy larvae that then becomes a more specialized bug was good.",
+    "link": "https://arxiv.org/pdf/1503.02531"
+  },
   {
     "title": "HyperTuning: Toward Adapting Large Language Models without Back-propagation",
     "author": "Jason Phang et al",

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,13 @@`
`1`	`1`	`[`
	`2`	`+ {`
	`3`	`+ "title": "Distilling the Knowledge in a Neural Network",`
	`4`	`+ "author": "Geoffry Hinton et al",`
	`5`	`+ "year": "2015",`
	`6`	`+ "topic": "distillation, ensemble, MoE",`
	`7`	`+ "venue": "Arxiv",`
	`8`	+ "description": "The first proposal of knowledge distillation. The main interesting point I found was that they change the temperature of the softmax to be higher to allow for softer targets. This allows for understanding what 2's look like 3's (in an MNIST example). Basically, adds a sort of regularization since more information can be carried in these softer targets compared to a single 0 or 1. They also propose the idea of having an ensemble of models, and then learning a distilled model that is smaller. The biological example of having a clumsy larvae that then becomes a more specialized bug was good.",
	`9`	`+ "link": "https://arxiv.org/pdf/1503.02531"`
	`10`	`+ },`
`2`	`11`	`{`
`3`	`12`	`"title": "HyperTuning: Toward Adapting Large Language Models without Back-propagation",`
`4`	`13`	`"author": "Jason Phang et al",`