Updated on 2024-11-04

lxaw · lxaw · commit bd810ba0c57d · 2024-11-04T06:22:24.000-05:00
diff --git a/index.html b/index.html
@@ -74,7 +74,7 @@ <h1>Where?</h1>
         </p>
         <h1>When?</h1>
         <p>
-        Last time this was edited was 2024-11-03 (YYYY/MM/DD).
+        Last time this was edited was 2024-11-04 (YYYY/MM/DD).
         </p>
         <small><a href="misc.html">misc</a></small>
     </div>
diff --git a/papers/list.json b/papers/list.json
@@ -1,4 +1,13 @@
 [
+  {
+    "title": "ESPACE: Dimensionality Reduction of Activations for Model Compression",
+    "author": "Charbel Sakr et al",
+    "year": "2024",
+    "topic": "llm, dimensionality reduction, activations, compression",
+    "venue": "NeurIPS",
+    "description": "Instead of decomposing weight matrices as done in previous work, ESPACE reduces the dimensionality of activation tensors by projecting them onto a pre-calibrated set of principal components using a static projection matrix P, where for an activation x, its projection is x̃ = PPᵀx. The projection matrix P is carefully constructed (using eigendecomposition of activation statistics) to preserve the most important components while reducing dimensionality, taking advantage of natural redundancies that exist in activation patterns due to properties like the Central Limit Theorem when stacking sequence/batch dimensions. During training, the weights remain uncompressed and fully trainable (maintaining model expressivity), while at inference time, the weight matrices can be pre-multiplied with the projection matrix (PTWᵀ) to achieve compression through matrix multiplication associativity: Y = WᵀX ≈ Wᵀ(PPᵀX) = (PTWᵀ)(PᵀX). This activation-centric approach is fundamentally different from previous methods because it maintains full model expressivity during training while still achieving compression at inference time, and it takes advantage of natural statistical redundancies in activation patterns rather than trying to directly compress weights.",
+    "link": "https://arxiv.org/pdf/2410.05437"
+  },
   {
     "title": "Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model",
     "author": "Chunting Zhou et al",
diff --git a/papers_read.html b/papers_read.html
@@ -75,10 +75,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
         I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
     </p>
     <p id="paperCount">
-        So far, we have read 150 papers. Let's keep it up!
+        So far, we have read 151 papers. Let's keep it up!
     </p> 
     <small id="searchCount">
-        Your search returned 150 papers. Nice! 
+        Your search returned 151 papers. Nice! 
     </small>
     
     <div class="search-inputs">
@@ -105,6 +105,16 @@ <h1>Here's where I keep a list of papers I have read.</h1>
         </thead>
         <tbody>
         
+            <tr>
+                <td>ESPACE: Dimensionality Reduction of Activations for Model Compression</td>
+                <td>Charbel Sakr et al</td>
+                <td>2024</td>
+                <td>llm, dimensionality reduction, activations, compression</td>
+                <td>NeurIPS</td>
+                <td>Instead of decomposing weight matrices as done in previous work, ESPACE reduces the dimensionality of activation tensors by projecting them onto a pre-calibrated set of principal components using a static projection matrix P, where for an activation x, its projection is x̃ = PPᵀx. The projection matrix P is carefully constructed (using eigendecomposition of activation statistics) to preserve the most important components while reducing dimensionality, taking advantage of natural redundancies that exist in activation patterns due to properties like the Central Limit Theorem when stacking sequence/batch dimensions. During training, the weights remain uncompressed and fully trainable (maintaining model expressivity), while at inference time, the weight matrices can be pre-multiplied with the projection matrix (PTWᵀ) to achieve compression through matrix multiplication associativity: Y = WᵀX ≈ Wᵀ(PPᵀX) = (PTWᵀ)(PᵀX). This activation-centric approach is fundamentally different from previous methods because it maintains full model expressivity during training while still achieving compression at inference time, and it takes advantage of natural statistical redundancies in activation patterns rather than trying to directly compress weights.</td>
+                <td><a href="https://arxiv.org/pdf/2410.05437" target="_blank">Link</a></td>
+            </tr>
+        
             <tr>
                 <td>Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model</td>
                 <td>Chunting Zhou et al</td>

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,13 @@`
`1`	`1`	`[`
	`2`	`+ {`
	`3`	`+ "title": "ESPACE: Dimensionality Reduction of Activations for Model Compression",`
	`4`	`+ "author": "Charbel Sakr et al",`
	`5`	`+ "year": "2024",`
	`6`	`+ "topic": "llm, dimensionality reduction, activations, compression",`
	`7`	`+ "venue": "NeurIPS",`
	`8`	+ "description": "Instead of decomposing weight matrices as done in previous work, ESPACE reduces the dimensionality of activation tensors by projecting them onto a pre-calibrated set of principal components using a static projection matrix P, where for an activation x, its projection is x̃ = PPᵀx. The projection matrix P is carefully constructed (using eigendecomposition of activation statistics) to preserve the most important components while reducing dimensionality, taking advantage of natural redundancies that exist in activation patterns due to properties like the Central Limit Theorem when stacking sequence/batch dimensions. During training, the weights remain uncompressed and fully trainable (maintaining model expressivity), while at inference time, the weight matrices can be pre-multiplied with the projection matrix (PTWᵀ) to achieve compression through matrix multiplication associativity: Y = WᵀX ≈ Wᵀ(PPᵀX) = (PTWᵀ)(PᵀX). This activation-centric approach is fundamentally different from previous methods because it maintains full model expressivity during training while still achieving compression at inference time, and it takes advantage of natural statistical redundancies in activation patterns rather than trying to directly compress weights.",
	`9`	`+ "link": "https://arxiv.org/pdf/2410.05437"`
	`10`	`+ },`
`2`	`11`	`{`
`3`	`12`	`"title": "Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model",`
`4`	`13`	`"author": "Chunting Zhou et al",`