Skip to content

Issues in reproducing the TCGA BRCA results #29

@Tomatenbiss

Description

@Tomatenbiss

First of all, I'd like to thank you for your amazing work. Your contributions to our field are outstanding.

Issue

I am currently working on a project using CONCH (commit 02d6ac5) and wanted to make sure my WSI analysis pipeline is implemented correctly. I first performed a sanity check using the CRC100k validation set and achieved results similar to those reported in the paper. However, I am stuggling to reproduce the TCGA BRCA results.

Dataset Selection

I downloaded 75 slides of each ILC and IDC (morphology codes 8500/3 and 8520/3). Since I could not find further information about the case or slide IDs used, I did not apply any additional selection criteria.

Embedding Extraction

I computed embeddings using create_embeddings.py from the MADELEINE repository (commit 3a5e9c8) - I am aware that a February update now recommends using TRIDENT instead, so that will be my next step in debugging. I did not find details about the magnification levels used, so for now I computed embeddings at 40x, 10x and 5x.

Setup

Torch: 2.4.1+cu118
GPU: Nvidia Tesla T4 (16 GB RAM)
CPU: 16-core AMD EPYC 7282
Operating System: Ubuntu 22.04.4 LTS,

Results

Using prompt ensembling with the prompts and templates provided in the paper and the MIZero notebook (commit 02d6ac5), I obtained the following zeroshot classification results:

5x: acc: 0.765 bacc: 0.762 weighted_f1: 0.763 roc_auc: 0.79
10x: acc: 0.720 bacc: 0.720 weighted_f1: 0.710 roc_auc: 0.819
40x: acc: 0.561 bacc: 0.527 weighted_f1: 0.428 roc_auc: 0.666

Questions

The results are lower than those reported in your paper. Could you please give me some input on the following?

  1. Are these differences within the expected variance due to dataset differences?

  2. At what magnification did you extract your embeddings?

  3. Is there any WSI benchmark dataset available for testing the MIZero pipeline?

  4. Could the discrepancy be due to using the older embedding script from MADELEINE instead of TRIDENT?

Thanks again for your time and for making these tools available!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions