Issues in reproducing the TCGA BRCA results

First of all, I'd like to thank you for your amazing work. Your contributions to our field are outstanding.


### Issue
I am currently working on a project using CONCH (commit 02d6ac5) and wanted to make sure my WSI analysis pipeline is implemented correctly. I first performed a sanity check using the CRC100k validation set and achieved results similar to those reported in the paper. However, I am stuggling to reproduce the TCGA BRCA results. 

### Dataset Selection
I downloaded 75 slides of each ILC and IDC (morphology codes 8500/3 and 8520/3). Since I could not find further information about the case or slide IDs used, I did not apply any additional selection criteria. 

### Embedding Extraction
I computed embeddings using [create_embeddings.py](https://github.com/mahmoodlab/MADELEINE/blob/main/bin/extract_slide_embeddings.py) from the MADELEINE repository (commit [`3a5e9c8`](https://github.com/mahmoodlab/MADELEINE/commit/4751969)) - I am aware that a February update now recommends using [TRIDENT](https://github.com/mahmoodlab/TRIDENT) instead, so that will be my next step in debugging. I did not find details about the magnification levels used, so for now I computed embeddings at 40x, 10x and 5x. 

### Setup
Torch: 2.4.1+cu118 
GPU: Nvidia Tesla T4 (16 GB RAM)
CPU: 16-core AMD EPYC 7282
Operating System: Ubuntu 22.04.4 LTS, 

### Results
Using prompt ensembling with the prompts and templates provided in the paper and the [MIZero notebook](https://github.com/mahmoodlab/CONCH/blob/main/notebooks/MI-zeroshot_classification_example_ensemble.ipynb) (commit 02d6ac5), I obtained the following zeroshot classification results:

**5x**:   acc: 0.765 bacc: 0.762 weighted_f1: 0.763 roc_auc: 0.79 
**10x**: acc: 0.720 bacc: 0.720 weighted_f1: 0.710 roc_auc: 0.819
**40x**: acc: 0.561 bacc: 0.527 weighted_f1: 0.428 roc_auc: 0.666

### Questions
The results are lower than those reported in your paper. Could you please give me some input on the following?

1. Are these differences within the expected variance due to dataset differences?

2. At what magnification did you extract your embeddings?

3. Is there any WSI benchmark dataset available for testing the MIZero pipeline?

4. Could the discrepancy be due to using the older embedding script from MADELEINE instead of TRIDENT?

Thanks again for your time and for making these tools available!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues in reproducing the TCGA BRCA results #29

Issue

Dataset Selection

Embedding Extraction

Setup

Results

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues in reproducing the TCGA BRCA results #29

Description

Issue

Dataset Selection

Embedding Extraction

Setup

Results

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions