Skip to content

Commit bd810ba

Browse files
committed
Updated on 2024-11-04
1 parent 0bc688c commit bd810ba

File tree

3 files changed

+22
-3
lines changed

3 files changed

+22
-3
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ <h1>Where?</h1>
7474
</p>
7575
<h1>When?</h1>
7676
<p>
77-
Last time this was edited was 2024-11-03 (YYYY/MM/DD).
77+
Last time this was edited was 2024-11-04 (YYYY/MM/DD).
7878
</p>
7979
<small><a href="misc.html">misc</a></small>
8080
</div>

papers/list.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
11
[
2+
{
3+
"title": "ESPACE: Dimensionality Reduction of Activations for Model Compression",
4+
"author": "Charbel Sakr et al",
5+
"year": "2024",
6+
"topic": "llm, dimensionality reduction, activations, compression",
7+
"venue": "NeurIPS",
8+
"description": "Instead of decomposing weight matrices as done in previous work, ESPACE reduces the dimensionality of activation tensors by projecting them onto a pre-calibrated set of principal components using a static projection matrix P, where for an activation x, its projection is x̃ = PPᵀx. The projection matrix P is carefully constructed (using eigendecomposition of activation statistics) to preserve the most important components while reducing dimensionality, taking advantage of natural redundancies that exist in activation patterns due to properties like the Central Limit Theorem when stacking sequence/batch dimensions. During training, the weights remain uncompressed and fully trainable (maintaining model expressivity), while at inference time, the weight matrices can be pre-multiplied with the projection matrix (PTWᵀ) to achieve compression through matrix multiplication associativity: Y = WᵀX ≈ Wᵀ(PPᵀX) = (PTWᵀ)(PᵀX). This activation-centric approach is fundamentally different from previous methods because it maintains full model expressivity during training while still achieving compression at inference time, and it takes advantage of natural statistical redundancies in activation patterns rather than trying to directly compress weights.",
9+
"link": "https://arxiv.org/pdf/2410.05437"
10+
},
211
{
312
"title": "Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model",
413
"author": "Chunting Zhou et al",

papers_read.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
7575
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
7676
</p>
7777
<p id="paperCount">
78-
So far, we have read 150 papers. Let's keep it up!
78+
So far, we have read 151 papers. Let's keep it up!
7979
</p>
8080
<small id="searchCount">
81-
Your search returned 150 papers. Nice!
81+
Your search returned 151 papers. Nice!
8282
</small>
8383

8484
<div class="search-inputs">
@@ -105,6 +105,16 @@ <h1>Here's where I keep a list of papers I have read.</h1>
105105
</thead>
106106
<tbody>
107107

108+
<tr>
109+
<td>ESPACE: Dimensionality Reduction of Activations for Model Compression</td>
110+
<td>Charbel Sakr et al</td>
111+
<td>2024</td>
112+
<td>llm, dimensionality reduction, activations, compression</td>
113+
<td>NeurIPS</td>
114+
<td>Instead of decomposing weight matrices as done in previous work, ESPACE reduces the dimensionality of activation tensors by projecting them onto a pre-calibrated set of principal components using a static projection matrix P, where for an activation x, its projection is x̃ = PPᵀx. The projection matrix P is carefully constructed (using eigendecomposition of activation statistics) to preserve the most important components while reducing dimensionality, taking advantage of natural redundancies that exist in activation patterns due to properties like the Central Limit Theorem when stacking sequence/batch dimensions. During training, the weights remain uncompressed and fully trainable (maintaining model expressivity), while at inference time, the weight matrices can be pre-multiplied with the projection matrix (PTWᵀ) to achieve compression through matrix multiplication associativity: Y = WᵀX ≈ Wᵀ(PPᵀX) = (PTWᵀ)(PᵀX). This activation-centric approach is fundamentally different from previous methods because it maintains full model expressivity during training while still achieving compression at inference time, and it takes advantage of natural statistical redundancies in activation patterns rather than trying to directly compress weights.</td>
115+
<td><a href="https://arxiv.org/pdf/2410.05437" target="_blank">Link</a></td>
116+
</tr>
117+
108118
<tr>
109119
<td>Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model</td>
110120
<td>Chunting Zhou et al</td>

0 commit comments

Comments
 (0)