Skip to content

Recommendation to Enrich Computer Vision Part Content with Latest Research #6

@frinkleko

Description

@frinkleko

Thank you so much for making such wonderful slides.

I am here to recommand to enrich Module 4, Chapter Computer vision, Page 141-153, by incorporating insights from the newly published paper titled "Scalable Pre-Training of Large Autoregressive Image Models.

This paper follow the inital idea of iGPT, which is use an autoregressive objective to pre-train vision transformers on image patches, without any supervision or labels but using pixel-level regression loss.

Moreover, AIM (model proposed in this paper) introduces two architectural modifications - prefix attention and MLP prediction heads and discussed correlation between the pre-training objective and the downstream performance.

I think this could be great additional information for the iGPT part. If possiable, I can help you with making the slides.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions