Skip to content

源码问题-图片编码 #230

@KyleWang-Hunter

Description

@KyleWang-Hunter

在图片预处理中,是先拼接的全局特征,再拼接的局部特征,代码如下:
tokenized_image = ([image_token_id] * num_queries_base + [image_token_id]) * num_queries_base
tokenized_image += [image_token_id]
if width_crop_num > 1 or height_crop_num > 1:
tokenized_image += ([image_token_id] * (num_queries * width_crop_num) + [image_token_id]) * (
num_queries * height_crop_num)

然后在deepencoder阶段,则是先拼接的局部特征,再拼接的全局特征,代码如下:
global_local_features = torch.cat([local_features, global_features, self.view_seperator[None, :]], dim=0)

请问我理解的是对的吗?为什么是这样?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions