Skip to content

[BUG] naflexvit_so400m_patch16_siglip has undocumented different default pos_embed_interp_mode of "bicubic" instead of "bilinear" #2542

Open
@redhottensors

Description

@redhottensors

Updates

Per further discussion, the difference is intentional, but undocumented. It is a difference with the reference implementation from Google Big Vision.


Original Report

Fix location:

cfg = NaFlexVitCfg(

This causes the default to be "bicubic":

pos_embed_interp_mode: str = 'bicubic' # Interpolation mode for position embedding resizing

Reference code showing "bilinear" interpolation:
https://github.com/google-research/big_vision/blob/0127fb6b337ee2a27bf4e54dea79cff176527356/big_vision/models/proj/image_text/naflex_vit.py#L67

After making this change, TIMM is able to forward siglip2 naflex with cosine similarly at each intermediate above 0.9999.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions