Add Swin Transformer Backbone #2298

AdonaiVera · 2025-06-13T14:13:31Z

Add Swin Transformer Backbone

This PR adds SwinTransformerBackbone, based on the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. It is the first step to support Grounding DINO in keras-hub, where Swin is the main image encoder.

This idea has been discussed in previous threads:

keras-cv issue #2114 — community request to support Grounding DINO, which requires Swin.
keras-hub issue #2117 — A request to support Swin-UNETR, a model originally designed for 3D medical image segmentation using Swin Transformers for effective feature extraction.

This PR is still a draft. I’m finishing:

Fixing some issues with float16 and float32
Preparing a Colab notebook to:
- Convert pretrained weights from the original repo
- Compare the outputs with the original model

Let me know if this direction makes sense, or if you have any comments or suggestions. Thanks!

AdonaiVera added 3 commits June 2, 2025 09:17

Add backbone, layer and test of swin transform - image encoder

fafcbb2

First iteration of swin transformer layers in keras hub models

69019f1

Deleted Swin Transformer files, remove typo

874c524