-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Description
在图片预处理中,是先拼接的全局特征,再拼接的局部特征,代码如下:
tokenized_image = ([image_token_id] * num_queries_base + [image_token_id]) * num_queries_base
tokenized_image += [image_token_id]
if width_crop_num > 1 or height_crop_num > 1:
tokenized_image += ([image_token_id] * (num_queries * width_crop_num) + [image_token_id]) * (
num_queries * height_crop_num)
然后在deepencoder阶段,则是先拼接的局部特征,再拼接的全局特征,代码如下:
global_local_features = torch.cat([local_features, global_features, self.view_seperator[None, :]], dim=0)
请问我理解的是对的吗?为什么是这样?
Metadata
Metadata
Assignees
Labels
No labels