Merge pull request #2 from mahmoodlab/more_models

guillaumejaume · web-flow · commit 7e2a7401aa8d · 2025-02-18T13:11:11.000-05:00
Support for additional patch encoders
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ This project was developed by the [Mahmood Lab](https://faisal.ai/) at Harvard M
 
 - **Tissue Segmentation**: Extract tissue from background using a DeepLabv3 model (supports H&E, IHC, penmark and artifact removal, etc.).
 - **Patch Extraction**: Extract tissue patches of any size and magnification.
-- **Patch Feature Extraction**: Extract patch embeddings from tissue patches using 13 popular foundation models, including [UNI](https://www.nature.com/articles/s41591-024-02857-3), [CONCH](https://www.nature.com/articles/s41591-024-02856-4), [Virchow](https://www.nature.com/articles/s41591-024-03141-0), [H-Optimus-0](https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0) and many more...
+- **Patch Feature Extraction**: Extract patch embeddings from tissue patches using 20 popular foundation models, including [UNI](https://www.nature.com/articles/s41591-024-02857-3), [CONCH](https://www.nature.com/articles/s41591-024-02856-4), [Virchow](https://www.nature.com/articles/s41591-024-03141-0), [H-Optimus-0](https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0) and many more...
 - **Slide Feature Extraction**: Extract slide embeddings from pre-extracted patch embeddings using 5 whole-slide foundation models, including [Threads](https://arxiv.org/abs/2501.16652) (coming soon!), [Titan](https://arxiv.org/abs/2411.19666), 
 [PRISM](https://arxiv.org/abs/2405.10254), [GigaPath](https://www.nature.com/articles/s41586-024-07441-w) and [CHIEF](https://www.nature.com/articles/s41586-024-07894-z). 
 
@@ -93,7 +93,7 @@ python run_single_slide.py --slide_path wsis/xxxx.svs --job_dir ./trident_proces
  - **Outputs**: 
    - Features are saved as h5 files in `./trident_processed/20x_256px/features_uni_v1`. (Shape: `(n_patches, feature_dim)`)
 
-Trident supports 13 patch encoders, loaded via a patch-level [`encoder_factory`](https://github.com/mahmoodlab/trident/blob/main/trident/patch_encoder_models/load.py#L14). Models requiring specific installations will return error messages with additional instructions. Gated models on HuggingFace require access requests.
+Trident supports 20 patch encoders, loaded via a patch-level [`encoder_factory`](https://github.com/mahmoodlab/trident/blob/main/trident/patch_encoder_models/load.py#L14). Models requiring specific installations will return error messages with additional instructions. Gated models on HuggingFace require access requests.
 
 - **UNI**: [MahmoodLab/UNI](https://huggingface.co/MahmoodLab/UNI)  (`--patch_encoder uni_v1`)
 - **UNIv2**: [MahmoodLab/UNI2-h](https://huggingface.co/MahmoodLab/UNI2-h)  (`--patch_encoder uni_v2`)
@@ -106,8 +106,11 @@ Trident supports 13 patch encoders, loaded via a patch-level [`encoder_factory`]
 - **Prov-Gigapath**: [prov-gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)  (`--patch_encoder gigapath`)
 - **H-Optimus-0**: [bioptimus/H-optimus-0](https://huggingface.co/bioptimus/H-optimus-0)  (`--patch_encoder hoptimus0`)
 - **MUSK**: [xiangjx/musk](https://huggingface.co/xiangjx/musk)  (`--patch_encoder musk`)
+- **Kaiko**: Hosted on TorchHub  (`--patch_encoder kaiko-vits8, kaiko-vits16, kaiko-vitb8, kaiko-vitb16, kaiko-vitl14`)
+- **Lunit**: [1aurent/vit_small_patch8_224.lunit_dino](https://huggingface.co/1aurent/vit_small_patch8_224.lunit_dino)  (`--patch_encoder lunit-vits8`)
+- **Hibou**: [histai/hibou-L](https://huggingface.co/histai/hibou-L)  (`--patch_encoder hibou_l`)
 - **CTransPath-CHIEF**: Automatic download  (`--patch_encoder ctranspath`)
-- **ResNet50**: Pretrained on ImageNet via torchvision.  (`--patch_encoder resnet50`)
+- **ResNet50**: Hosted on torchvision.  (`--patch_encoder resnet50`)
 
 **Step 3b: Slide Feature Extraction:** Extracts slide embeddings using a slide encoder. Will also automatically extract patch embeddings. 
  - **Command**:
@@ -124,11 +127,12 @@ Trident supports 13 patch encoders, loaded via a patch-level [`encoder_factory`]
    - Features are saved as h5 files in `./trident_processed/20x_256px/slide_features_titan`. (Shape: `(feature_dim)`)
 
 Trident supports 5 slide encoders, loaded via a slide-level [`encoder_factory`](https://github.com/mahmoodlab/trident/blob/main/trident/slide_encoder_models/load.py#L14). Models requiring specific installations will return error messages with additional instructions. Gated models on HuggingFace require access requests.
-- **Threads**: Coming Soon! [MahmoodLab/threads](https://huggingface.co/MahmoodLab/threads) (`--slide_encoder threads`).
-- **Titan**: [MahmoodLab/TITAN](https://huggingface.co/MahmoodLab/TITAN) (`--slide_encoder titan`)
-- **PRISM**: [paige-ai/Prism](https://huggingface.co/paige-ai/Prism) (`--slide_encoder prism`)
-- **CHIEF**: [CHIEF](https://github.com/hms-dbmi/CHIEF) (`--slide_encoder chief`)
-- **GigaPath**: [prov-gigapath]()  (`--slide_encoder gigapath`)
+- **Threads**: Coming Soon! [MahmoodLab/threads](https://huggingface.co/MahmoodLab/threads) (`--slide_encoder threads`). Based on `conch_v15` with `512x512` @20x.
+- **Titan**: [MahmoodLab/TITAN](https://huggingface.co/MahmoodLab/TITAN) (`--slide_encoder titan`). Based on `conch_v15` with `512x512` @20x.
+- **PRISM**: [paige-ai/Prism](https://huggingface.co/paige-ai/Prism) (`--slide_encoder prism`). Based on `virchow` with `256x256` @20x.
+- **CHIEF**: [CHIEF](https://github.com/hms-dbmi/CHIEF) (`--slide_encoder chief`). Based on `ctranspath` with `256x256` @10x.
+- **GigaPath**: [prov-gigapath](https://huggingface.co/prov-gigapath/prov-gigapath)  (`--slide_encoder gigapath`). Based on `gigapath` with `256x256x` @20x.
+- **Madeleine**: [MahmoodLab/madeleine](https://huggingface.co/MahmoodLab/madeleine) (`--slide_encoder madeleine`). Based on `conch_v1` with `256x256` @10x.
 
 > [!NOTE]
 > If you have a patient containing multiple slides, you have two ways for constructing whole-patient embeddings: processing each slide independently and taking the average of the slide features (late fusion) or pooling all patches together and processing that as a single "pseudo-slide" (early fusion). You can use Trident-generated slide embeddings in your own late fusion pipeline, or use Trident-generated patch embeddings in your own early fusion pipeline. For an implementation of both fusion strategies, please check out our sister repository [Patho-Bench](https://github.com/mahmoodlab/Patho-Bench).
diff --git a/run_batch_of_slides.py b/run_batch_of_slides.py
@@ -60,10 +60,12 @@ def parse_arguments():
     parser.add_argument('--patch_encoder', type=str, default='conch_v15', 
                         choices=['conch_v1', 'uni_v1', 'uni_v2', 'ctranspath', 'phikon', 
                                  'resnet50', 'gigapath', 'virchow', 'virchow2', 
-                                 'hoptimus0', 'phikon_v2', 'conch_v15', 'musk'], 
+                                 'hoptimus0', 'phikon_v2', 'conch_v15', 'musk', 'hibou_l',
+                                 'kaiko-vits8', 'kaiko-vits16', 'kaiko-vitb8', 'kaiko-vitb16',
+                                 'kaiko-vitl14', 'lunit-vits8'],
                         help='Patch encoder to use')
     parser.add_argument('--slide_encoder', type=str, default=None, 
-                        choices=['threads', 'titan', 'prism', 'gigapath', 'chief',
+                        choices=['threads', 'titan', 'prism', 'gigapath', 'chief', 'madeleine',
                                  'mean-virchow', 'mean-virchow2', 'mean-conch_v1', 'mean-conch_v15', 'mean-ctranspath',
                                  'mean-gigapath', 'mean-resnet50', 'mean-hoptimus0', 'mean-phikon', 'mean-phikon_v2',
                                  'mean-musk', 'mean-uni_v1', 'mean-uni_v2',  
diff --git a/run_single_slide.py b/run_single_slide.py
@@ -22,11 +22,13 @@ def parse_arguments():
     parser.add_argument("--gpu", type=int, default=0, help="GPU index to use for processing tasks")
     parser.add_argument("--slide_path", type=str, required=True, help="Path to the WSI file to process")
     parser.add_argument("--job_dir", type=str, required=True, help="Directory to store outputs")
-    parser.add_argument("--patch_encoder", type=str, default="uni_v1",
-                        choices=["conch_v1", "uni_v1", "uni_v2", "ctranspath", "phikon",
-                                 "resnet50", "gigapath", "virchow", "virchow2",
-                                 "hoptimus0", "phikon_v2", "conch_v15", "musk"],
-                        help="Patch encoder for feature extraction")
+    parser.add_argument('--patch_encoder', type=str, default='conch_v15', 
+                        choices=['conch_v1', 'uni_v1', 'uni_v2', 'ctranspath', 'phikon', 
+                                 'resnet50', 'gigapath', 'virchow', 'virchow2', 
+                                 'hoptimus0', 'phikon_v2', 'conch_v15', 'musk', 'hibou_l',
+                                 'kaiko-vits8', 'kaiko-vits16', 'kaiko-vitb8', 'kaiko-vitb16',
+                                 'kaiko-vitl14', 'lunit-vits8'],
+                        help='Patch encoder to use')
     parser.add_argument("--mag", type=int, choices=[5, 10, 20, 40], default=20,
                         help="Magnification at which patches/features are extracted")
     parser.add_argument("--patch_size", type=int, default=256, help="Patch size at which coords/features are extracted")
diff --git a/tests/test_patch_encoders.py b/tests/test_patch_encoders.py
@@ -78,6 +78,21 @@ def test_hoptimus0_forward(self):
         
     def test_musk_forward(self):
         self._test_encoder_forward('musk')
+    
+    def test_hibou_l_forward(self):
+        self._test_encoder_forward('hibou_l')
+    
+    def test_kaiko_forward(self):
+        self._test_encoder_forward('kaiko-vits8')
+        self._test_encoder_forward('kaiko-vits16')
+        self._test_encoder_forward('kaiko-vitb8')
+        self._test_encoder_forward('kaiko-vitb16')
+        self._test_encoder_forward('kaiko-vitl14')
+        
+    def test_lunitvits8_forward(self):
+        self._test_encoder_forward('lunit-vits8')
+    
+    
 
 if __name__ == '__main__':
     unittest.main()
diff --git a/tests/test_slide_encoders.py b/tests/test_slide_encoders.py
@@ -69,10 +69,17 @@ def test_slide_encoder_factory_with_valid_names(self):
             ('chief', CHIEFSlideEncoder),
             ('gigapath', GigaPathSlideEncoder),
             ('titan', TitanSlideEncoder),
+            ('madeleine', MadeleineSlideEncoder),
         ]:
             encoder = encoder_factory(model_name)
             self.assertIsInstance(encoder, expected_class)
 
+    def test_madeleine_encoder_initialization(self):
+        sample_batch = {
+            'features': torch.randn(1, 100, 512),
+        }
+        self._test_encoder_forward(MadeleineSlideEncoder(), sample_batch, torch.bfloat16)
+
     def test_slide_encoder_factory_invalid_name(self):
         print("\033[95m" + "Testing Slide Encoder Factory with invalid names" + "\033[0m")
         with self.assertRaises(ValueError):
diff --git a/trident/patch_encoder_models/load.py b/trident/patch_encoder_models/load.py
@@ -49,6 +49,20 @@ def encoder_factory(model_name, **kwargs):
         enc = Phikonv2InferenceEncoder
     elif model_name == 'musk':
         enc = MuskInferenceEncoder
+    elif model_name == 'hibou_l':
+        enc = HibouLInferenceEncoder
+    elif model_name == 'kaiko-vitb8':
+        enc = KaikoB8InferenceEncoder
+    elif model_name == 'kaiko-vitb16':
+        enc = KaikoB16InferenceEncoder
+    elif model_name == 'kaiko-vits8':
+        enc = KaikoS8InferenceEncoder
+    elif model_name == 'kaiko-vits16':
+        enc = KaikoS16InferenceEncoder
+    elif model_name == 'kaiko-vitl14':
+        enc = KaikoL14InferenceEncoder
+    elif model_name == 'lunit-vits8':
+        enc = LunitS8InferenceEncoder
     else:
         raise ValueError(f"Unknown encoder name {model_name}")
 
@@ -130,6 +144,7 @@ def forward(self, x):
                 return_global=self.return_global  
                 )[0]  # Forward pass yields (vision_cls, text_cls). We only need vision_cls.
 
+
 class Conchv1InferenceEncoder(BasePatchEncoder):
     
     def _build(self, with_proj = False, normalize = False, **kwargs):
@@ -235,6 +250,83 @@ def forward_features(self, x):
         return out
     
 
+class HibouLInferenceEncoder(BasePatchEncoder):
+    def _build(self, **kwargs):
+
+        from transformers import AutoModel
+        from torchvision.transforms import InterpolationMode
+
+        self.enc_name = 'hibou_l'
+        weights_path = get_weights_path('patch', self.enc_name)
+        
+        if os.path.exists(weights_path):
+            model = AutoModel.from_pretrained(weights_path)
+        else:
+            model = AutoModel.from_pretrained("histai/hibou-L", trust_remote_code=True)
+            os.makedirs(weights_path, exist_ok=True)
+            model.save_pretrained(weights_path)
+        
+        mean, std = get_constants('hibou')
+        eval_transform = get_eval_transforms(mean, std, target_img_size=224, interpolation=InterpolationMode.BICUBIC, max_size=None, antialias=True)
+        precision = torch.float32
+
+        return model, eval_transform, precision
+    
+    def forward(self, x):
+        out = self.forward_features(x)
+        out = out.pooler_output
+        return out
+    
+    def forward_features(self, x):
+        out = self.model(pixel_values=x)
+        return out
+    
+
+class KaikoInferenceEncoder(BasePatchEncoder):
+    MODEL_NAME = None  # set in subclasses
+
+    def _build(self, **kwargs):
+        from torchvision.transforms import InterpolationMode
+        self.enc_name = f"kaiko-{self.MODEL_NAME}"
+        weights_path = get_weights_path("patch", self.enc_name)
+
+        if os.path.exists(weights_path):
+            model = torch.load(weights_path, map_location="cpu", weights_only=False)
+        else:
+            model = torch.hub.load("kaiko-ai/towards_large_pathology_fms", self.MODEL_NAME, trust_repo=True)
+            os.makedirs(os.path.dirname(weights_path), exist_ok=True)
+            torch.save(model, weights_path)
+
+        mean, std = get_constants("kaiko")
+        eval_transform = get_eval_transforms(mean, std, target_img_size=224, center_crop=True, interpolation=InterpolationMode.BILINEAR, max_size=None, antialias=True)
+        precision = torch.float32
+
+        return model, eval_transform, precision
+
+    def forward(self, x):
+        return self.model(x)
+
+
+class KaikoS16InferenceEncoder(KaikoInferenceEncoder):
+    MODEL_NAME = "vits16"
+
+
+class KaikoS8InferenceEncoder(KaikoInferenceEncoder):
+    MODEL_NAME = "vits8"
+
+
+class KaikoB16InferenceEncoder(KaikoInferenceEncoder):
+    MODEL_NAME = "vitb16"
+
+
+class KaikoB8InferenceEncoder(KaikoInferenceEncoder):
+    MODEL_NAME = "vitb8"
+
+
+class KaikoL14InferenceEncoder(KaikoInferenceEncoder):
+    MODEL_NAME = "vitl14"
+
+
 class ResNet50InferenceEncoder(BasePatchEncoder):
     def _build(
         self, 
@@ -273,7 +365,27 @@ def forward_features(self, x):
             out = out[0]
         return out
                      
+
+class LunitS8InferenceEncoder(BasePatchEncoder):
+    def _build(self, **kwargs):
+        import timm
+        from timm.data import resolve_model_data_config
+        from timm.data.transforms_factory import create_transform
+
+        self.enc_name = 'lunit-vits8'
+
+        model = timm.create_model(
+            model_name="hf-hub:1aurent/vit_small_patch8_224.lunit_dino",
+            pretrained=True,
+        )
+
+        data_config = resolve_model_data_config(model)
+        eval_transform = create_transform(**data_config, is_training=False)
+        precision = torch.float32
+
+        return model, eval_transform, precision
     
+
 class UNIInferenceEncoder(BasePatchEncoder):
     def _build(
         self, 
diff --git a/trident/patch_encoder_models/local_ckpts.json b/trident/patch_encoder_models/local_ckpts.json
@@ -10,6 +10,13 @@
     "virchow2": "",
     "hoptimus0": "",
     "phikon_v2": "./phikon_v2",
+    "hibou_l": "./hibou_l",
+    "kaiko-vitb8": "./kaiko_b8",
+    "kaiko-vitb16": "./kaiko_b16",
+    "kaiko-vits8": "./kaiko_s8",
+    "kaiko-vits16": "./kaiko_s16",
+    "kaiko-vitl14": "./kaiko_l14",
+    "lunit-vits8": "./lunit_s8",
     "conch_v15": "./conchv1_5/pytorch_model_vision.bin",
     "custom_encoder": ""
 }
diff --git a/trident/patch_encoder_models/utils/constants.py b/trident/patch_encoder_models/utils/constants.py
@@ -2,6 +2,10 @@
 IMAGENET_STD = [0.229, 0.224, 0.225]
 OPENAI_MEAN = [0.48145466, 0.4578275, 0.40821073]
 OPENAI_STD = [0.26862954, 0.26130258, 0.27577711]
+HIBOU_MEAN = [0.7068, 0.5755, 0.722]
+HIBOU_STD = [0.195, 0.2316, 0.1816]
+KAIKO_MEAN = [0.5, 0.5, 0.5]
+KAIKO_STD = [0.5, 0.5, 0.5]
 NONE_MEAN = None
 NONE_STD = None
 
@@ -10,7 +14,11 @@ def get_constants(norm='imagenet'):
         return IMAGENET_MEAN, IMAGENET_STD
     elif norm == 'openai_clip':
         return OPENAI_MEAN, OPENAI_STD
+    elif norm == 'hibou':
+        return HIBOU_MEAN, HIBOU_STD
     elif norm == 'none':
         return NONE_MEAN, NONE_STD
+    elif norm == 'kaiko':
+        return KAIKO_MEAN, KAIKO_STD
     else:
-        raise ValueError(f"Invalid norm: {norm}")
+        raise ValueError(f"Invalid norm: {norm}")
diff --git a/trident/slide_encoder_models/load.py b/trident/slide_encoder_models/load.py
@@ -38,6 +38,8 @@ def encoder_factory(model_name, pretrained=True, freeze=True, **kwargs):
             enc = CHIEFSlideEncoder
         elif 'gigapath' in model_name:
             enc = GigaPathSlideEncoder
+        elif 'madeleine' in model_name:
+            enc = MadeleineSlideEncoder
         elif 'abmil' in model_name:
             enc = ABMILSlideEncoder
         else:
@@ -53,7 +55,8 @@ def encoder_factory(model_name, pretrained=True, freeze=True, **kwargs):
     'tcga': 'conch_v15',
     'prism': 'virchow',
     'chief': 'ctranspath',
-    'gigapath': 'gigapath'
+    'gigapath': 'gigapath',
+    'madeleine': 'conch_v1',
 }
 
 ####################################################################################################
@@ -286,6 +289,31 @@ def forward(self, batch, device='cuda'):
         return z
 
 
+class MadeleineSlideEncoder(BaseSlideEncoder):
+
+    def _build(self, pretrained=True, **kwargs):
+
+        assert pretrained, "MadeleineSlideEncoder has no non-pretrained models. Please load with pretrained=True."
+
+        self.enc_name = 'madeleine'
+        weights_path = get_weights_path('slide', self.enc_name)
+        embedding_dim = 512
+
+        try:
+            from madeleine.models.factory import create_model_from_pretrained
+        except:
+            traceback.print_exc()
+            raise Exception("Please install Madeleine using `pip install git+https://github.com/mahmoodlab/MADELEINE.git`")  
+        
+        model, precision = create_model_from_pretrained(weights_path)
+
+        return model, precision, embedding_dim
+    
+    def forward(self, x, device='cuda'):
+        z = self.model.encode_he(x['features'], device)
+        return z
+
+
 class ThreadsSlideEncoder(BaseSlideEncoder):
 
     def _build(self, pretrained=True, **kwargs):
@@ -297,7 +325,7 @@ def _build(self, pretrained=True, **kwargs):
         except:
             traceback.print_exc()
             raise Exception("Coming Soon! Thanks for your patience.")
-
+        
         return None, None, None
 
     def forward(self, batch, device='cuda', return_raw_attention=False):
@@ -351,6 +379,20 @@ def _build(self, model_name = 'mean-default', **kwargs):
             embedding_dim = 1024
         elif model_name == 'mean-musk':
             embedding_dim = 1024
+        elif model_name == 'mean-hibou_l':
+            embedding_dim = 1024
+        elif model_name == 'mean-kaiko-vit8s':
+            embedding_dim = 384
+        elif model_name == 'mean-kaiko-vit16s':
+            embedding_dim = 384
+        elif model_name == 'mean-kaiko-vit8b':
+            embedding_dim = 768
+        elif model_name == 'mean-kaiko-vit16b':
+            embedding_dim = 768
+        elif model_name == 'mean-kaiko-vit14l':
+            embedding_dim = 1024
+        elif model_name == 'lunit-vits8':
+            embedding_dim = 384
         else:
             print(f"\033[93mWARNING: Could not automatically infer embedding_dim for mean encoder {self.enc_name}. Setting to None.\033[0m")
             embedding_dim = None
diff --git a/trident/slide_encoder_models/local_ckpts.json b/trident/slide_encoder_models/local_ckpts.json
@@ -1,3 +1,4 @@
 {
-    "chief": "./CHIEF"
+    "chief": "./CHIEF",
+    "madeleine": "./MADELEINE"
 }

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,4 @@`
`1`	`1`	`{`
`2`		`- "chief": "./CHIEF"`
	`2`	`+ "chief": "./CHIEF",`
	`3`	`+ "madeleine": "./MADELEINE"`
`3`	`4`	`}`