By Nicholas Konz*, Richard Osuala*, (* = equal contribution), Preeti Verma, Yuwen Chen, Hanxue Gu, Haoyu Dong, Yaqian Chen, Andrew Marshall, Lidia Garrucho, Kaisar Kushibar, Daniel M. Lang, Gene S. Kim, Lars J. Grimm, John M. Lewin, James S. Duncan, Julia A. Schnabel, Oliver Diaz, Karim Lekadir and Maciej A. Mazurowski.
We provide an easy-to-use framework for evaluating distance/similarity metrics between unpaired sets of medical images with a variety of metrics, accompanying our paper. For example, this can be used to evaluate the performance of image generative models in the medical imaging domain. The codebase includes implementations of several distance metrics that can be used to compare images, as well as tools for evaluating the performance of generative models on various downstream tasks.
Included metrics:
- FRD (Fréchet Radiomic Distance)
- FID (Fréchet Inception Distance)
- Radiology FID/RadFID
- KID (Kernel Inception Distance)
- CMMD (CLIP Maximum Mean Discrepancy)
Thanks to the following repositories which this framework utilizes and builds upon:
- frd-score
- pyradiomics
- gan-metrics-pytorch, which we modified to allow for computing RadFID.
- cmmd-pytorch
Please cite our paper if you use this framework in your work:
@article{konzosuala_frd2025,
title={Fr\'echet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets},
author = {Konz, Nicholas and Osuala, Richard and Verma, Preeti and Chen, Yuwen and Gu, Hanxue and Dong, Haoyu and Chen, Yaqian and Marshall, Andrew and Garrucho, Lidia and Kushibar, Kaisar and Lang, Daniel M. and Kim, Gene S. and Grimm, Lars J. and Lewin, John M. and Duncan, James S. and Schnabel, Julia A. and Diaz, Oliver and Lekadir, Karim and Mazurowski, Maciej A.},
year={2025},
eprint={2412.01496},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.01496},
}
- First, please run
pip3 install -r requirements.txt
to install the required packages. - Finally, RadFID requires the RadImageNet weights for the InceptionV3 model, which can be downloaded from RadImageNet's official source here. Once downloaded, please place the
InceptionV3.pt
checkpoint file intosrc/gan-metrics-pytorch/models
and rename it toRadImageNet_InceptionV3.pt
. Our code will take care of the rest.
You can compute various distance metrics between two sets of images using the following command:
bash compute_allmetrics.sh $IMAGE_FOLDER1 $IMAGE_FOLDER2 $METRICS
where $IMAGE_FOLDER1
and $IMAGE_FOLDER2
are the paths to the two folders containing the images you want to compare, and $METRICS
is the list of metrics you want to compute out of FRD, FID, RadFID, KID and CMMD, as a single comma-separated string. E.g., to compute only FRD and CMMD, you would run:
bash compute_allmetrics.sh $IMAGE_FOLDER1 $IMAGE_FOLDER2 FRD,CMMD
This will print out the computed distances to the terminal. For example, this can be used to evaluate the performance of a generative model by comparing the generated images to a set of real reference images.
As in our paper (Secs. 5.2 and 5.3), you can also evaluate how distance estimations and computation times change with the sample size of images used to compute the distance metrics. This can be done by running the run_sample_efficiency.sh
script, with the same arguments as compute_allmetrics.sh
(see Basic Metric Computation), except now, you'll need to specify the sample sizes you want to use, provided as a single string with spaces separating each size. For example, to compute the distances for sample sizes of 10, 100, 500 and 1000 images, you can run:
bash run_sample_efficiency.sh $IMAGE_FOLDER1 $IMAGE_FOLDER2 "10 100 500 1000"
The distance values and computation times will be printed to the terminal.
To evaluate the sensitivity of the distance metrics to image transformations (as in Sec. 5.4 of our paper), you can use the transform_images.py
script. This script applies a set of transformations to a folder of images $IMAGE_FOLDER
and saves the transformed images in separate folders. The transformations include Gaussian blur and sharpness adjustment with different parameters (kernel sizes of 5 and 9, and sharpness factors of 0, 0.5 and 2, respectively). The script can be run with the following command:
python3 transform_images.py $IMAGE_FOLDER
where $IMAGE_FOLDER
is the path to the folder containing the images you want to transform. The script will create a new folder called transformed_images
in the same directory as the input folder, and save the transformed images in subfolders named after the transformation type (e.g., gaussian_blur
).
Transformed images for the input folder will be saved in additional folders within the same directory, one for each type of transformation. From here, the sensitivity of the distance metrics to a type of transformation can be evaluated simply by computing the distance metrics between the non-transformed image folder and the transformed image folder (see Basic Metric Computation). For example, for a transformation of Gaussian blur with kernel size 5, you can run:
bash compute_allmetrics.sh $IMAGE_FOLDER $IMAGE_FOLDER_TRANSFORMED
where $IMAGE_FOLDER_TRANSFORMED
is the path to the folder containing the transformed images: {$IMAGE_FOLDER}_gaussian_blur5
in this case.
As in Sec. 4.2 of our paper, you can evaluate the correlation between the distance metrics and the performance of a downstream task (e.g., classification, segmentation, etc.) using the correlate_downstream_task.py
script. For example, as in our paper, this can be used to evaluate image-to-image translation models; given a test set
To use this script, create a simple CSV file with the following columns:
distance
: the distance metric value (e.g., FRD) between the test images (e.g., generated/translated images) and the reference imagestask_performance
: the performance of the downstream task model on the test images (e.g., Dice coefficient)
From here, you can run the script with the following command:
python3 correlate_downstream_task.py $CSV_FILE
where $CSV_FILE
is the path to the CSV file you created. The script will compute the correlation between the distance metric and the downstream task performance, and print the results to the terminal.
The correlation will be computed using the Pearson linear correlation coefficient, and the Spearman and Kendall nonlinear/rank correlation coefficients; the results will be printed to the terminal. The script will also plot a scatter plot of the distance metric values against the downstream task performance, and save the plot as a PNG file in the same directory as the input CSV file.
The plot will be saved as correlation_plot.png
, and the correlation coefficient will be printed to the terminal. The script will also print the p-value of the correlation tests, which indicates the statistical significance of the correlations.
The script ood_detection.py
allows you to evaluate the ability of different feature representations to detect out-of-distribution (OOD) images in both threshold-free and threshold-based settings, as shown in Section 4.1 of our paper. This is computed given:
- A reference in-distribution image set (used to compute a reference feature distribution), for example, a model's training set.
- A test set of both in-distribution (ID) and out-of-distribution (OOD) images.
This script extracts feature embeddings (e.g., standardized radiomic features as is used in FRD, or InceptionV3 features with ImageNet or RadImageNet weights as is evaluated in our paper), and evaluates:
- Threshold-independent performance: using AUC based on distance from the ID mean.
- Threshold-based detection: using a 95th percentile threshold on ID validation distances, to compute accuracy, TPR, TNR, and AUC.
To run this file, you can use the following command:
python3 ood_detection.py \
--img_folder_ref_id ${IMAGE_FOLDER_REF_ID} \
--img_folder_test_id ${IMAGE_FOLDER_TEST_ID} \
--img_folder_test_ood ${IMAGE_FOLDER_TEST_OOD}
where:
-
${IMAGE_FOLDER_REF_ID}
is the path to the folder containing the reference in-distribution images. -
${IMAGE_FOLDER_TEST_ID}
is the path to the folder containing the test in-distribution images. -
${IMAGE_FOLDER_TEST_OOD}
is the path to the folder containing the test out-of-distribution images.
The various results will be printed to the terminal.