You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "This paper introduces hyper-connections, which is a novel alternative to residual connections. Basically, they introduce learnable depth and width connections.",
9
+
"link": "https://arxiv.org/pdf/2409.19606"
10
+
},
2
11
{
3
12
"title": "Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising",
4
13
"author": "Gongfan Fang et al",
@@ -1050,7 +1059,7 @@
1050
1059
"topic": "q-learning, reinforcement learning",
1051
1060
"venue": "Arxiv",
1052
1061
"description": "The authors present the first deep learning model that can learn complex control policies, and they teach it to play Atari 2600 games using Q-learning. Their goal was to create one net that can play as many games as possible.",
1053
-
"link": "TODO"
1062
+
"link": "https://arxiv.org/pdf/1312.5602"
1054
1063
},
1055
1064
{
1056
1065
"title": "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Encoding",
@@ -1059,7 +1068,7 @@
1059
1068
"topic": "quantization, encoding, pruning",
1060
1069
"venue": "ICML",
1061
1070
"description": "A three-pronged approach to compressing nets. They prune networks, then quantize and share weights, and then apply Huffman encoding.",
1062
-
"link": "TODO"
1071
+
"link": "https://arxiv.org/pdf/1510.00149"
1063
1072
},
1064
1073
{
1065
1074
"title": "Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or -1",
@@ -1068,7 +1077,7 @@
1068
1077
"topic": "quantization, efficiency, binary",
1069
1078
"venue": "Arxiv",
1070
1079
"description": "Introduction of training Binary Neural Networks, or nets with binary weights and activations. They also present experiments on deterministic vs stochastic binarization. They use the deterministic one for the most part, except for activations.",
1071
-
"link": "TODO"
1080
+
"link": "https://arxiv.org/pdf/1602.02830"
1072
1081
},
1073
1082
{
1074
1083
"title": "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks",
@@ -1077,16 +1086,7 @@
1077
1086
"topic": "efficiency, scaling",
1078
1087
"venue": "ICML",
1079
1088
"description": "A study of model scaling is presented. They propose a novel scaling method to uniformly scale all dimensions of depth/width/resolution using a compound coefficient. This paper presents a method for scaling width/depth/resolution; for instance, if you want to use 2^{N} more compute resources, then you can scale by their coefficients to do so. They also quantify the relationship between width, depth, and resolution.",
1080
-
"link": "TODO"
1081
-
},
1082
-
{
1083
-
"title": "2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency",
1084
-
"author": "Yonggan Fu et al",
1085
-
"year": "2021",
1086
-
"topic": "precision, adversarial, efficiency",
1087
-
"venue": "ACM",
1088
-
"description": "Introduction of a Random Precision Switch algorithm that has potential for defending against adversarial attacks while promoting efficiency.",
1089
-
"link": "TODO"
1089
+
"link": "https://arxiv.org/pdf/1905.11946"
1090
1090
},
1091
1091
{
1092
1092
"title": "The wake-sleep algorithm for unsupervised neural networks",
Copy file name to clipboardExpand all lines: papers_read.html
+14-14Lines changed: 14 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -46,6 +46,16 @@ <h1>Here's where I keep a list of papers I have read.</h1>
46
46
</thead>
47
47
<tbody>
48
48
49
+
<tr>
50
+
<td>Hyper-Connections</td>
51
+
<td>Defa Zhu et al</td>
52
+
<td>2024</td>
53
+
<td>residual connections, hyper-connections</td>
54
+
<td>Arxiv</td>
55
+
<td>This paper introduces hyper-connections, which is a novel alternative to residual connections. Basically, they introduce learnable depth and width connections.</td>
<td>Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising</td>
51
61
<td>Gongfan Fang et al</td>
@@ -1213,7 +1223,7 @@ <h1>Here's where I keep a list of papers I have read.</h1>
1213
1223
<td>q-learning, reinforcement learning</td>
1214
1224
<td>Arxiv</td>
1215
1225
<td>The authors present the first deep learning model that can learn complex control policies, and they teach it to play Atari 2600 games using Q-learning. Their goal was to create one net that can play as many games as possible.</td>
@@ -1233,7 +1243,7 @@ <h1>Here's where I keep a list of papers I have read.</h1>
1233
1243
<td>quantization, efficiency, binary</td>
1234
1244
<td>Arxiv</td>
1235
1245
<td>Introduction of training Binary Neural Networks, or nets with binary weights and activations. They also present experiments on deterministic vs stochastic binarization. They use the deterministic one for the most part, except for activations.</td>
@@ -1243,17 +1253,7 @@ <h1>Here's where I keep a list of papers I have read.</h1>
1243
1253
<td>efficiency, scaling</td>
1244
1254
<td>ICML</td>
1245
1255
<td>A study of model scaling is presented. They propose a novel scaling method to uniformly scale all dimensions of depth/width/resolution using a compound coefficient. This paper presents a method for scaling width/depth/resolution; for instance, if you want to use 2^{N} more compute resources, then you can scale by their coefficients to do so. They also quantify the relationship between width, depth, and resolution.</td>
1246
-
<td><ahref="TODO" target="_blank">Link</a></td>
1247
-
</tr>
1248
-
1249
-
<tr>
1250
-
<td>2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency</td>
1251
-
<td>Yonggan Fu et al</td>
1252
-
<td>2021</td>
1253
-
<td>precision, adversarial, efficiency</td>
1254
-
<td>ACM</td>
1255
-
<td>Introduction of a Random Precision Switch algorithm that has potential for defending against adversarial attacks while promoting efficiency.</td>
0 commit comments