Skip to content

Commit 0295746

Browse files
committed
Updated on 2024-10-29
1 parent d55e19f commit 0295746

File tree

3 files changed

+15
-15
lines changed

3 files changed

+15
-15
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ <h1>Where?</h1>
7474
</p>
7575
<h1>When?</h1>
7676
<p>
77-
Last time this was edited was 2024-10-28 (YYYY/MM/DD).
77+
Last time this was edited was 2024-10-29 (YYYY/MM/DD).
7878
</p>
7979
<small><a href="misc.html">misc</a></small>
8080
</div>

papers/list.json

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
[
22
{
3-
"title": "MaskGIT: Masked Generative Image Transformer",
4-
"author": "Huiwen Chang et al",
5-
"year": "2022",
6-
"topic": "generative models, masking, image transformer",
7-
"venue": "Arxiv",
8-
"description": "The MaskGIT paper introduces a novel bidirectional transformer architecture for image generation that can predict multiple image tokens in parallel, rather than generating them sequentially like previous methods. They develop a new iterative decoding strategy where the model predicts all masked tokens simultaneously at each step, keeps the most confident predictions, and refines the remaining tokens over multiple iterations using a decreasing mask scheduling function. The approach significantly outperforms previous transformer-based methods in both generation quality and speed on ImageNet, while maintaining good diversity in the generated samples. The bidirectional nature of their model enables flexible image editing applications like inpainting, outpainting, and class-conditional object manipulation without requiring any architectural changes or task-specific training.",
9-
"link": "https://arxiv.org/pdf/2202.04200"
3+
"title": "Neural Discrete Representation Learning",
4+
"author": "Aaron van der Oord et al",
5+
"year": "2017",
6+
"topic": "generative models, vae",
7+
"venue": "NeurIPS",
8+
"description": "The key innovation of this paper is the introduction of the Vector Quantised-Variational AutoEncoder (VQ-VAE), which combines vector quantization with VAEs to learn discrete latent representations instead of continuous ones. Unlike previous approaches to discrete latent variables which struggled with high variance or optimization challenges, VQ-VAE uses a simple but effective nearest-neighbor lookup system in the latent space, along with a straight-through gradient estimator, to learn meaningful discrete codes. This approach allows the model to avoid the common posterior collapse problem where latents are ignored when paired with powerful decoders, while still maintaining good reconstruction quality comparable to continuous VAEs. The discrete nature of the latent space enables the model to focus on capturing important high-level features that span many dimensions in the input space (like objects in images or phonemes in speech) rather than local details, and these discrete latents can then be effectively modeled using powerful autoregressive priors for generation.",
9+
"link": "https://arxiv.org/pdf/1711.00937"
1010
},
1111
{
1212
"title": "Improved Precision and Recall Metric for Assessing Generative Models",

papers_read.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -106,13 +106,13 @@ <h1>Here's where I keep a list of papers I have read.</h1>
106106
<tbody>
107107

108108
<tr>
109-
<td>MaskGIT: Masked Generative Image Transformer</td>
110-
<td>Huiwen Chang et al</td>
111-
<td>2022</td>
112-
<td>generative models, masking, image transformer</td>
113-
<td>Arxiv</td>
114-
<td>The MaskGIT paper introduces a novel bidirectional transformer architecture for image generation that can predict multiple image tokens in parallel, rather than generating them sequentially like previous methods. They develop a new iterative decoding strategy where the model predicts all masked tokens simultaneously at each step, keeps the most confident predictions, and refines the remaining tokens over multiple iterations using a decreasing mask scheduling function. The approach significantly outperforms previous transformer-based methods in both generation quality and speed on ImageNet, while maintaining good diversity in the generated samples. The bidirectional nature of their model enables flexible image editing applications like inpainting, outpainting, and class-conditional object manipulation without requiring any architectural changes or task-specific training.</td>
115-
<td><a href="https://arxiv.org/pdf/2202.04200" target="_blank">Link</a></td>
109+
<td>Neural Discrete Representation Learning</td>
110+
<td>Aaron van der Oord et al</td>
111+
<td>2017</td>
112+
<td>generative models, vae</td>
113+
<td>NeurIPS</td>
114+
<td>The key innovation of this paper is the introduction of the Vector Quantised-Variational AutoEncoder (VQ-VAE), which combines vector quantization with VAEs to learn discrete latent representations instead of continuous ones. Unlike previous approaches to discrete latent variables which struggled with high variance or optimization challenges, VQ-VAE uses a simple but effective nearest-neighbor lookup system in the latent space, along with a straight-through gradient estimator, to learn meaningful discrete codes. This approach allows the model to avoid the common posterior collapse problem where latents are ignored when paired with powerful decoders, while still maintaining good reconstruction quality comparable to continuous VAEs. The discrete nature of the latent space enables the model to focus on capturing important high-level features that span many dimensions in the input space (like objects in images or phonemes in speech) rather than local details, and these discrete latents can then be effectively modeled using powerful autoregressive priors for generation.</td>
115+
<td><a href="https://arxiv.org/pdf/1711.00937" target="_blank">Link</a></td>
116116
</tr>
117117

118118
<tr>

0 commit comments

Comments
 (0)