Open
Description
Updates
Per further discussion, the difference is intentional, but undocumented. It is a difference with the reference implementation from Google Big Vision.
Original Report
Fix location:
pytorch-image-models/timm/models/naflexvit.py
Line 1767 in a7c5368
This causes the default to be "bicubic":
Reference code showing "bilinear" interpolation:
https://github.com/google-research/big_vision/blob/0127fb6b337ee2a27bf4e54dea79cff176527356/big_vision/models/proj/image_text/naflex_vit.py#L67
After making this change, TIMM is able to forward siglip2 naflex with cosine similarly at each intermediate above 0.9999.