源码问题-图片编码

在图片预处理中，是先拼接的全局特征，再拼接的局部特征，代码如下：
                tokenized_image = ([image_token_id] * num_queries_base + [image_token_id]) * num_queries_base
                tokenized_image += [image_token_id] 
                if width_crop_num > 1 or height_crop_num > 1:
                    tokenized_image += ([image_token_id] * (num_queries * width_crop_num) + [image_token_id]) * (
                                num_queries * height_crop_num)

然后在deepencoder阶段，则是先拼接的局部特征，再拼接的全局特征，代码如下：
global_local_features = torch.cat([local_features, global_features, self.view_seperator[None, :]], dim=0)

请问我理解的是对的吗？为什么是这样？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

源码问题-图片编码 #230

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

源码问题-图片编码 #230

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions