You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper follow the inital idea of iGPT, which is use an autoregressive objective to pre-train vision transformers on image patches, without any supervision or labels but using pixel-level regression loss.
Moreover, AIM (model proposed in this paper) introduces two architectural modifications - prefix attention and MLP prediction heads and discussed correlation between the pre-training objective and the downstream performance.
I think this could be great additional information for the iGPT part. If possiable, I can help you with making the slides.