Skip to content

Commit a88941c

Browse files
committed
Updated on 2024-08-26
1 parent 0b563a0 commit a88941c

File tree

2 files changed

+10
-1
lines changed

2 files changed

+10
-1
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ <h3>
3939
When?
4040
</h3>
4141
<p>
42-
Last time this was edited was 2024-08-25 (YYYY/MM/DD).
42+
Last time this was edited was 2024-08-26 (YYYY/MM/DD).
4343
</p>
4444
<small><a href="misc.html">misc</a></small>
4545
</body>

papers/list.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
11
[
2+
{
3+
"title": "Distilling the Knowledge in a Neural Network",
4+
"author": "Geoffry Hinton et al",
5+
"year": "2015",
6+
"topic": "distillation, ensemble, MoE",
7+
"venue": "Arxiv",
8+
"description": "The first proposal of knowledge distillation. The main interesting point I found was that they change the temperature of the softmax to be higher to allow for softer targets. This allows for understanding what 2's look like 3's (in an MNIST example). Basically, adds a sort of regularization since more information can be carried in these softer targets compared to a single 0 or 1. They also propose the idea of having an ensemble of models, and then learning a distilled model that is smaller. The biological example of having a clumsy larvae that then becomes a more specialized bug was good.",
9+
"link": "https://arxiv.org/pdf/1503.02531"
10+
},
211
{
312
"title": "HyperTuning: Toward Adapting Large Language Models without Back-propagation",
413
"author": "Jason Phang et al",

0 commit comments

Comments
 (0)