Skip to content

Commit 5332473

Browse files
committed
Updated on 2025-03-04
1 parent 4a2cbd6 commit 5332473

File tree

2 files changed

+78
-2
lines changed

2 files changed

+78
-2
lines changed

papers/list.json

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,40 @@
11
[
2+
{
3+
"title": "Born Again Networks",
4+
"author": "Tommaso Furlanello et al",
5+
"year": "2018",
6+
"topic": "knowledge distillation, self distillation",
7+
"venue": "ICML",
8+
"description": "This paper introduces Born-Again Neural Networks (BANs), a novel approach where knowledge distillation is applied to train student models with identical architectures to their teachers. Surprisingly, these student models consistently outperform their teachers on both computer vision and language modeling tasks, even when the student and teacher have the exact same architecture and capacity. The authors experiment with multiple generations of BANs, showing that performance continues to improve (though with diminishing returns), and explore the effect of \"dark knowledge\" by testing variations where they either weight examples by teacher confidence or permute non-argmax outputs. Their framework also allows for cross-architecture knowledge transfer, such as training ResNet students from DenseNet teachers, resulting in state-of-the-art performance on CIFAR datasets and demonstrating that the benefits of knowledge distillation extend beyond model compression.",
9+
"link": "https://arxiv.org/pdf/1805.04770"
10+
},
11+
{
12+
"title": "The Curious Case of Neural Text Degeneration",
13+
"author": "Ari Holtzman et al",
14+
"year": "2020",
15+
"topic": "nucleus sampling, generation",
16+
"venue": "ICLR",
17+
"description": "This work identifies fundamental problems with traditional text generation methods: beam search creates repetitive text while pure sampling produces incoherent content. As a solution, the authors propose Nucleus Sampling, which dynamically truncates the probability distribution to include only the most likely tokens that constitute the vast majority of the probability mass, avoiding both repetition and incoherence issues. Through extensive evaluations comparing perplexity, vocabulary distribution, self-similarity, and human judgments, they demonstrate that Nucleus Sampling produces text that is both high-quality and diverse, closely matching human-written text distributions. The authors also make the important observation that human language rarely maximizes probability, suggesting that language models which optimize for likelihood may inherently struggle to generate natural text.",
18+
"link": "https://arxiv.org/pdf/1904.09751"
19+
},
20+
{
21+
"title": "Data-Free Knowledge Distillation for Deep Neural Networks",
22+
"author": "Raphael Gontijo Lopes et al",
23+
"year": "2017",
24+
"topic": "knowledge distillation, network inversion",
25+
"venue": "Arxiv",
26+
"description": "This paper introduces a novel method for data-free knowledge distillation, which enables the compression of deep neural networks without requiring access to the original training dataset. The authors propose using various forms of activation metadata collected during the initial model training to reconstruct synthetic datasets that can then be used to train smaller student networks. They explore different approaches to creating this activation metadata, including top-layer statistics, all-layers statistics with dropout filters, and spectral methods based on graph Fourier transforms. Experimental results on MNIST and CelebA datasets demonstrate that their spectral methods can achieve compression rates of approximately 50% with minimal accuracy loss, making this approach valuable for scenarios where the original training data cannot be shared due to privacy concerns, storage limitations, or proprietary restrictions.",
27+
"link": "https://arxiv.org/pdf/1710.07535"
28+
},
29+
{
30+
"title": "Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion",
31+
"author": "Hongxu Yu et al",
32+
"year": "2020",
33+
"topic": "knowledge distillation, network inversion",
34+
"venue": "CVPR",
35+
"description": "The paper introduces DeepInversion, a method that synthesizes realistic, high-fidelity images from a trained CNN without requiring access to the original training data by using information stored in batch normalization layers. The authors further enhance this technique with Adaptive DeepInversion, which improves image diversity by maximizing Jensen-Shannon divergence between teacher and student network outputs. With these methods, the paper demonstrates three data-free applications: network pruning, knowledge transfer between models, and continual learning for adding new classes to existing networks. The synthesized images show impressive realism and generalize well across different model architectures, enabling knowledge distillation and other tasks that typically require the original training dataset.",
36+
"link": "https://openaccess.thecvf.com/content_CVPR_2020/papers/Yin_Dreaming_to_Distill_Data-Free_Knowledge_Transfer_via_DeepInversion_CVPR_2020_paper.pdf"
37+
},
238
{
339
"title": "Sequence-Level Knowledge Distillation",
440
"author": "Yoon Kim et al",

papers_read.html

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
1616
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
1717
</p>
1818
<p id="paperCount">
19-
So far, we have read 231 papers. Let's keep it up!
19+
So far, we have read 235 papers. Let's keep it up!
2020
</p>
2121
<small id="searchCount">
22-
Your search returned 231 papers. Nice!
22+
Your search returned 235 papers. Nice!
2323
</small>
2424

2525
<div class="search-inputs">
@@ -46,6 +46,46 @@ <h1>Here's where I keep a list of papers I have read.</h1>
4646
</thead>
4747
<tbody>
4848

49+
<tr>
50+
<td>Born Again Networks</td>
51+
<td>Tommaso Furlanello et al</td>
52+
<td>2018</td>
53+
<td>knowledge distillation, self distillation</td>
54+
<td>ICML</td>
55+
<td>This paper introduces Born-Again Neural Networks (BANs), a novel approach where knowledge distillation is applied to train student models with identical architectures to their teachers. Surprisingly, these student models consistently outperform their teachers on both computer vision and language modeling tasks, even when the student and teacher have the exact same architecture and capacity. The authors experiment with multiple generations of BANs, showing that performance continues to improve (though with diminishing returns), and explore the effect of &quot;dark knowledge&quot; by testing variations where they either weight examples by teacher confidence or permute non-argmax outputs. Their framework also allows for cross-architecture knowledge transfer, such as training ResNet students from DenseNet teachers, resulting in state-of-the-art performance on CIFAR datasets and demonstrating that the benefits of knowledge distillation extend beyond model compression.</td>
56+
<td><a href="https://arxiv.org/pdf/1805.04770" target="_blank">Link</a></td>
57+
</tr>
58+
59+
<tr>
60+
<td>The Curious Case of Neural Text Degeneration</td>
61+
<td>Ari Holtzman et al</td>
62+
<td>2020</td>
63+
<td>nucleus sampling, generation</td>
64+
<td>ICLR</td>
65+
<td>This work identifies fundamental problems with traditional text generation methods: beam search creates repetitive text while pure sampling produces incoherent content. As a solution, the authors propose Nucleus Sampling, which dynamically truncates the probability distribution to include only the most likely tokens that constitute the vast majority of the probability mass, avoiding both repetition and incoherence issues. Through extensive evaluations comparing perplexity, vocabulary distribution, self-similarity, and human judgments, they demonstrate that Nucleus Sampling produces text that is both high-quality and diverse, closely matching human-written text distributions. The authors also make the important observation that human language rarely maximizes probability, suggesting that language models which optimize for likelihood may inherently struggle to generate natural text.</td>
66+
<td><a href="https://arxiv.org/pdf/1904.09751" target="_blank">Link</a></td>
67+
</tr>
68+
69+
<tr>
70+
<td>Data-Free Knowledge Distillation for Deep Neural Networks</td>
71+
<td>Raphael Gontijo Lopes et al</td>
72+
<td>2017</td>
73+
<td>knowledge distillation, network inversion</td>
74+
<td>Arxiv</td>
75+
<td>This paper introduces a novel method for data-free knowledge distillation, which enables the compression of deep neural networks without requiring access to the original training dataset. The authors propose using various forms of activation metadata collected during the initial model training to reconstruct synthetic datasets that can then be used to train smaller student networks. They explore different approaches to creating this activation metadata, including top-layer statistics, all-layers statistics with dropout filters, and spectral methods based on graph Fourier transforms. Experimental results on MNIST and CelebA datasets demonstrate that their spectral methods can achieve compression rates of approximately 50% with minimal accuracy loss, making this approach valuable for scenarios where the original training data cannot be shared due to privacy concerns, storage limitations, or proprietary restrictions.</td>
76+
<td><a href="https://arxiv.org/pdf/1710.07535" target="_blank">Link</a></td>
77+
</tr>
78+
79+
<tr>
80+
<td>Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion</td>
81+
<td>Hongxu Yu et al</td>
82+
<td>2020</td>
83+
<td>knowledge distillation, network inversion</td>
84+
<td>CVPR</td>
85+
<td>The paper introduces DeepInversion, a method that synthesizes realistic, high-fidelity images from a trained CNN without requiring access to the original training data by using information stored in batch normalization layers. The authors further enhance this technique with Adaptive DeepInversion, which improves image diversity by maximizing Jensen-Shannon divergence between teacher and student network outputs. With these methods, the paper demonstrates three data-free applications: network pruning, knowledge transfer between models, and continual learning for adding new classes to existing networks. The synthesized images show impressive realism and generalize well across different model architectures, enabling knowledge distillation and other tasks that typically require the original training dataset.</td>
86+
<td><a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Yin_Dreaming_to_Distill_Data-Free_Knowledge_Transfer_via_DeepInversion_CVPR_2020_paper.pdf" target="_blank">Link</a></td>
87+
</tr>
88+
4989
<tr>
5090
<td>Sequence-Level Knowledge Distillation</td>
5191
<td>Yoon Kim et al</td>

0 commit comments

Comments
 (0)