Skip to content

Commit d85ac94

Browse files
committed
Updated on 2024-11-07
1 parent 557f6a0 commit d85ac94

File tree

3 files changed

+21
-2
lines changed

3 files changed

+21
-2
lines changed

.DS_Store

0 Bytes
Binary file not shown.

papers/list.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
11
[
2+
{
3+
"title": "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks",
4+
"author": "Tim Salimans et al",
5+
"year": "2016",
6+
"topic": "normalization, gradient descent",
7+
"venue": "Arxiv",
8+
"description": "This paper introduces weight normalization, a simple reparameterization technique that decouples a neural network's weight vectors into their direction and magnitude by expressing w = (g/||v||)v, where g is a scalar and v is a vector. The key insight is that this decoupling improves optimization by making the conditioning of the gradient better - the direction and scale of weight updates can be learned somewhat independently, which helps avoid problems with pathological curvature in the optimization landscape. While inspired by batch normalization, weight normalization is deterministic and doesn't add noise to gradients or create dependencies between minibatch examples, making it well-suited for scenarios like reinforcement learning and RNNs where batch normalization is problematic. The authors also propose a data-dependent initialization scheme where g and bias terms are initialized to normalize the initial pre-activations of neurons, helping ensure good scaling of activations across layers at the start of training.",
9+
"link": "https://arxiv.org/pdf/1602.07868"
10+
},
211
{
312
"title": "Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models",
413
"author": "Tuomas Kynkäänniemi et al",

papers_read.html

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,10 @@ <h1>Here's where I keep a list of papers I have read.</h1>
7575
I typically use this to organize papers I found interesting. Please feel free to do whatever you want with it. Note that this is not every single paper I have ever read, just a collection of ones that I remember to put down.
7676
</p>
7777
<p id="paperCount">
78-
So far, we have read 159 papers. Let's keep it up!
78+
So far, we have read 160 papers. Let's keep it up!
7979
</p>
8080
<small id="searchCount">
81-
Your search returned 159 papers. Nice!
81+
Your search returned 160 papers. Nice!
8282
</small>
8383

8484
<div class="search-inputs">
@@ -105,6 +105,16 @@ <h1>Here's where I keep a list of papers I have read.</h1>
105105
</thead>
106106
<tbody>
107107

108+
<tr>
109+
<td>Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks</td>
110+
<td>Tim Salimans et al</td>
111+
<td>2016</td>
112+
<td>normalization, gradient descent</td>
113+
<td>Arxiv</td>
114+
<td>This paper introduces weight normalization, a simple reparameterization technique that decouples a neural network&#x27;s weight vectors into their direction and magnitude by expressing w = (g/||v||)v, where g is a scalar and v is a vector. The key insight is that this decoupling improves optimization by making the conditioning of the gradient better - the direction and scale of weight updates can be learned somewhat independently, which helps avoid problems with pathological curvature in the optimization landscape. While inspired by batch normalization, weight normalization is deterministic and doesn&#x27;t add noise to gradients or create dependencies between minibatch examples, making it well-suited for scenarios like reinforcement learning and RNNs where batch normalization is problematic. The authors also propose a data-dependent initialization scheme where g and bias terms are initialized to normalize the initial pre-activations of neurons, helping ensure good scaling of activations across layers at the start of training.</td>
115+
<td><a href="https://arxiv.org/pdf/1602.07868" target="_blank">Link</a></td>
116+
</tr>
117+
108118
<tr>
109119
<td>Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models</td>
110120
<td>Tuomas Kynkäänniemi et al</td>

0 commit comments

Comments
 (0)