Skip to content

Commit 1ad3fa8

Browse files
committed
Updated on 2025-02-21
1 parent ff4ec94 commit 1ad3fa8

File tree

3 files changed

+136
-53
lines changed

3 files changed

+136
-53
lines changed

.DS_Store

0 Bytes
Binary file not shown.

index.html

Lines changed: 91 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -4,71 +4,109 @@
44
<meta charset="UTF-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1.0">
66
<title>Lex Whalen</title>
7-
<link rel="stylesheet" href="styles.css">
87
<style>
98
body {
10-
font-family: Arial, sans-serif;
9+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
1110
line-height: 1.6;
12-
margin: 0;
13-
padding: 20px;
14-
background-color: #f9f9f9;
11+
max-width: 900px;
12+
margin: 0 auto;
13+
padding: 2rem;
14+
color: #333;
15+
}
16+
17+
.content-wrapper {
18+
overflow: hidden;
19+
}
20+
21+
.figure-wrapper {
22+
float: right;
23+
margin: 0 0 1rem 2rem;
24+
text-align: center;
1525
}
1626
.profile-pic {
27+
width: 300px;
28+
height: auto;
29+
margin-bottom: 0.5rem;
30+
}
31+
figcaption {
32+
font-size: 0.9rem;
33+
color: #666;
34+
}
35+
36+
h1 {
37+
font-size: 1.5rem;
38+
margin: 1rem 0 0.5rem;
39+
}
40+
41+
h1:first-child {
42+
margin-top: 0;
43+
}
44+
45+
p {
46+
margin: 0 0 1rem;
47+
}
48+
49+
a {
50+
color: #4299e1;
51+
text-decoration: none;
52+
}
53+
54+
a:hover {
55+
text-decoration: underline;
56+
}
57+
58+
small {
59+
color: #666;
1760
display: block;
18-
margin: 0 auto;
19-
border-radius: 50%;
20-
width: 150px;
21-
height: 150px;
22-
object-fit: cover;
23-
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
61+
margin-bottom: 1rem;
2462
}
25-
.center {
26-
max-width: 800px;
27-
margin: 0 auto;
28-
padding: 20px;
63+
64+
.bottom-content {
65+
clear: both;
66+
padding-top: 1rem;
2967
}
30-
.who-section img {
31-
float: right; /* Float the image to the right */
32-
margin-left: 20px; /* Add some spacing between text and image */
33-
margin-top: -45px; /* Adjust the image upward */
34-
width: 150px;
35-
height: auto;
68+
69+
footer {
70+
margin-top: 2rem;
3671
}
3772

73+
@media (max-width: 600px) {
74+
.figure-wrapper {
75+
float: none;
76+
margin: 0 auto 1rem;
77+
width: 200px;
78+
}
79+
}
3880
</style>
3981
</head>
4082
<body>
41-
<div class="center">
42-
<section>
43-
<h1>Who?</h1>
44-
<div class="who-section">
45-
<img src="pictures/myself_profile_pic_bw.png" alt="Lex Whalen" class="profile-pic">
46-
47-
<p>My name is Lex Whalen. You can see a picture of me <a href="picture.html">here</a>. For more info, my LinkedIn is <a href="https://www.linkedin.com/in/lxaw/">here</a>.</p>
48-
</div>
49-
</section>
50-
<section>
51-
<h1>What?</h1>
52-
<p>PhD student @ GT. My goal is to help us achieve and improve super-human intelligence at an affordable price, both to our pockets and to our world. Bonus points if we can vastly improve society as we do it.</p>
53-
<small>I also claim to have a <a href="personal_interests.html">life outside of research</a>.</small>
54-
</section>
55-
<section>
56-
<h1>Why?</h1>
57-
<p>The purpose of this website is to store <a href="cv/main.pdf">my CV</a>, a <a href="papers_read.html">list of searchable papers</a>, and a list of <a href="books_read.html">books I've read</a>.</p>
58-
</section>
59-
<section>
60-
<h1>Where?</h1>
61-
<p>You can contact me by emailing at [first letter of first name lowercase][last name lowercase]7@gatech.edu.</p>
62-
<p>You can also check out my <a href="https://github.com/lxaw">GitHub</a>.</p>
63-
<p>As long as the project is interesting, I am willing to go anywhere and learn anything.</p>
64-
</section>
65-
<section>
66-
<h1>When?</h1>
67-
Last time this was edited was 2025-02-01 (YYYY/MM/DD).
68-
</section>
69-
<footer>
70-
<small><a href="misc.html">misc</a></small>
71-
</footer>
83+
<div class="content-wrapper">
84+
<figure class="figure-wrapper">
85+
<img src="pictures/myself_nagasaki.png" alt="Lex Whalen" class="profile-pic">
86+
<figcaption>Nagasaki, Dec. 2024</figcaption>
87+
</figure>
88+
89+
<h1>Who?</h1>
90+
<p>My name is Lex Whalen. You can see a picture of me <a href="picture.html">here</a>. For more info, my LinkedIn is <a href="https://www.linkedin.com/in/lxaw/">here</a>.</p>
91+
92+
<h1>What?</h1>
93+
<p>PhD student @ GT. My goal is to help us achieve and improve super-human intelligence at an affordable price, both to our pockets and to our world. Bonus points if we can vastly improve society as we do it.</p>
94+
<small>I also claim to have a <a href="personal_interests.html">life outside of research</a>.</small>
95+
96+
<h1>Why?</h1>
97+
<p>The purpose of this website is to store <a href="cv/main.pdf">my CV</a>, a <a href="papers_read.html">list of searchable papers</a>, and a list of <a href="books_read.html">books I've read</a>.</p>
98+
99+
<h1>Where?</h1>
100+
<p>You can contact me by emailing at [first letter of first name lowercase][last name lowercase]7@gatech.edu.</p>
101+
<p>You can also check out my <a href="https://github.com/lxaw">GitHub</a>.</p>
102+
<p>As long as the project is interesting, I am willing to go anywhere and learn anything.</p>
103+
104+
<h1>When?</h1>
105+
Last time this was edited was 2025-02-21 (YYYY/MM/DD).
72106
</div>
107+
108+
<footer>
109+
<small><a href="misc.html">misc</a></small>
110+
</footer>
73111
</body>
74-
</html>
112+
</html>

papers/list.json

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,49 @@
11
[
2+
{
3+
"title": "DistiLLM: Towards Streamlined Distillation for Large Language Models",
4+
"author": "Jongwoo Ko et al",
5+
"year": "2024",
6+
"topic": "knowledge distillation, llm",
7+
"venue": "ICML",
8+
"description": "This paper introduces DISTILLM, a new knowledge distillation framework for large language models that addresses critical limitations in efficiency and effectiveness through two key components: a novel skew Kullback-Leibler divergence loss with strong theoretical foundations, and an adaptive off-policy approach that efficiently utilizes student-generated outputs. The skew KLD is mathematically proven to provide more stable gradients and minimal approximation errors compared to standard KLD objectives, while the adaptive off-policy approach uses a replay buffer to dramatically improve sample efficiency and reduce training time. Through extensive experiments on various tasks like instruction-following and text summarization, DISTILLM achieves state-of-the-art performance while requiring up to 4.3x less training time than existing methods. The framework demonstrates strong scalability across different model sizes (120M to 13B parameters) and model families, making it a practical solution for compressing large language models.",
9+
"link": "https://arxiv.org/pdf/2402.03898"
10+
},
11+
{
12+
"title": "MiniLLM: Knowledge Distillation of Large Language Models",
13+
"author": "Yuxian Gu et al",
14+
"year": "2024",
15+
"topic": "knowledge distillation, llm",
16+
"venue": "ICLR",
17+
"description": "This paper introduces MiniLLM, a novel approach to knowledge distillation for large language models that uses reverse Kullback-Leibler divergence (KLD) instead of forward KLD, which prevents student models from overestimating low-probability regions of the teacher's distribution. The authors develop an optimization approach using policy gradient with three key improvements: single-step decomposition to reduce variance, teacher-mixed sampling to prevent reward hacking, and length normalization to eliminate length bias. Through extensive experiments on various model sizes (120M to 13B parameters) and instruction-following tasks, they demonstrate that MiniLLM produces more precise responses with higher quality, lower exposure bias, better calibration, and improved long-text generation compared to baselines. Most importantly, they show the approach scales effectively across different model families and sizes while requiring significantly fewer training tokens than traditional methods, making it a practical solution for compressing large language models.",
18+
"link": "https://arxiv.org/pdf/2306.08543"
19+
},
20+
{
21+
"title": "Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective",
22+
"author": "Helong Zhou et al",
23+
"year": "2021",
24+
"topic": "knowledge distillation",
25+
"venue": "ICLR",
26+
"description": "The paper analyzes knowledge distillation through the lens of bias-variance tradeoff, discovering that soft labels create a sample-wise tradeoff where some training examples reduce variance at the cost of increased bias while others have different effects. The authors identify "regularization samples" where distillation primarily acts as a regularizer and find that their quantity negatively correlates with model performance in standard knowledge distillation. To address this, they propose "weighted soft labels" that adaptively weight each training sample's contribution to optimally balance the bias-variance tradeoff, leading to improved distillation performance. The key insight is that regularization samples shouldn't be completely ignored but rather have their influence carefully modulated through weighting, which the authors validate through both theoretical analysis and extensive experiments establishing new state-of-the-art results.",
27+
"link": "https://arxiv.org/pdf/2102.00650"
28+
},
29+
{
30+
"title": "The Unreasonable Ineffectiveness of the Deeper Layers",
31+
"author": "Andrey Gromov et al",
32+
"year": "2024",
33+
"topic": "layer analysis, pruning",
34+
"venue": "Arxiv",
35+
"description": "This paper presents an empirical study of layer pruning in large language models, demonstrating that many layers can be removed without significant performance degradation until a critical threshold. The authors introduce a novel pruning approach that identifies optimal layers to remove by analyzing the similarity between layer representations, combined with a small amount of parameter-efficient finetuning to \"heal\" the model after pruning. They discover that LLMs are surprisingly robust to removing up to half of their layers, suggesting either that current pretraining methods don't fully utilize deeper layers or that shallow layers play a crucial role in storing knowledge. A key finding is that while question-answering performance shows a sharp transition after removing critical layers, autoregressive loss changes smoothly, indicating an interesting disconnect between these different measures of model capability.",
36+
"link": "https://arxiv.org/pdf/2403.17887"
37+
},
38+
{
39+
"title": "LLM Pruning and Distillation in Practice: The Minitron Approach",
40+
"author": "Sharath Turuvekere Sreenivas et al",
41+
"year": "2024",
42+
"topic": "llm, pruning, distillation",
43+
"venue": "Arxiv",
44+
"description": "The paper introduces an improved approach to LLM model compression by combining structured pruning with knowledge distillation, notably adding a \"teacher correction\" phase that allows the teacher model to adapt to new data distributions when the original pretraining dataset is unavailable. The authors explore two distinct pruning strategies - depth pruning (removing entire layers) and width pruning (reducing hidden/attention/MLP dimensions), along with a new task-based saliency criteria for depth pruning. They demonstrate this approach by successfully compressing Mistral NeMo 12B and Llama 3.1 8B models to 8B and 4B parameters respectively, using significantly fewer training tokens than training from scratch. The methodology is particularly valuable because it removes the dependency on accessing the original pretraining dataset, making it more practical for compressing proprietary models.",
45+
"link": "https://arxiv.org/pdf/2408.11796"
46+
},
247
{
348
"title": "Direct Preference Optimization: Your Language Model is Secretly a Reward Model",
449
"author": "Rafael Rafailov et al",

0 commit comments

Comments
 (0)