Skip to content

Commit 9d215c3

Browse files
committed
chore: add intro to BENCHMARKS.md
1 parent 244c796 commit 9d215c3

File tree

1 file changed

+92
-3
lines changed

1 file changed

+92
-3
lines changed

BENCHMARKS.md

Lines changed: 92 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,96 @@
11
# Benchmarks
22

3-
## How to:
3+
**This is a fair and controlled comparison between the different cell/nuclei-segmentation models that are implemented in this library.**
44

5-
### Segmentation performance.
5+
### Background info
66

7-
### Model latency.
7+
- **<span style="color:green">Cell/Nuclei-segmentation</span>** performance is benchmarked against the [**Pannuke**](https://arxiv.org/abs/2003.10778) and [**Lizard**](http://arxiv.org/abs/2108) datasets.
8+
- **<span style="color:orange">Panoptic-segmentation performance</span>** performance is benchmarked against **HGSOC** and **CIN2** datasets.
9+
10+
#### Segmentation Performance Metrics
11+
12+
- Panoptic Quality (PQ)
13+
- The bPQ (cell type unaware), mPQ (cell type aware) and cell type specific PQs for all the models are reported
14+
- Mean Interception over Union (mIoU)
15+
- mIoU is also reported for the semantic-segmentation results of the panoptic-segmentation models.
16+
17+
#### Latency Metrics for Multipart Models
18+
19+
Remember that these models are multipart. Each of the models are composed of an encoder-decoder neural-network and a post-processing pipeline. Thus, for all of the models, we report:
20+
21+
- Number of parms in the encoder-decoder architecture
22+
- Encoder-decoder FLOPS
23+
- Encoder-decoder latency (img/s)
24+
- Post-processing latencies (img/s)
25+
- Total latency (img/s).
26+
27+
Note that the post-processing pipelines are often composed of several parts. For nuclei/cell-segmentation, the post-processing pipeline is composed of a nuclei instance separation part and a cell type majority voting part. The latency for these are benchmarked separately. For panoptic segmentation, also the semantic-segmentation post-processing part is benchmarked separately. **The reported latency metrics are an average over the validation split.**
28+
29+
#### Devices
30+
31+
The model latencies depend on the hardware. I'll benhmark the latencies on my laptop and on a HPC server.
32+
33+
- Laptop specs:
34+
- a worn-out NVIDIA GeForce RTX 2080 Mobile (8Gb VRAM)
35+
- Intel i7-9750H 6 x 6 cores @ 2.60GHz (32 GiB RAM)
36+
- HPC specs:
37+
- Nvidia V100 (32 GB VRAM)
38+
- Xeon Gold 6230 2 x 20 cores @ 2,1 GHz (384 GiB RAM)
39+
40+
#### About the Datasets
41+
42+
**Pannuke** is the only dataset that contains fixed sized (256x256) patches so the benchmarking is straight-forward and not affected by the hyperparameters of the post-processing pipelines. However, the **Lizard**, **HGSOC**, and **CIN2** datasets contain differing sized images. This means, firstly, that the patching strategy of the training data-split will have an effect on the model performance, and secondly, that the inference requires a sliding-window approach. The segmentation performance is typically quite sensitive to the sliding-window hyperparameters, especially, to the `patch size` and `stride`. Thus, with these datasets, I'm going to also report the training data patching strategy and we also grid-search the best sliding-window hyperparameters.
43+
44+
#### Data Splits
45+
46+
**Pannuke** and **Lizard** datasets are divided in three splits. For these datasets, we report the mean of the 3-fold cross-validation. The **CIN2** and **HGSOC** datasets contain only a training splits and relatively small validation splits, thus, for those datasets we report the metrics on the validation split.
47+
48+
#### Regularization methods
49+
50+
The models are regularized during training via multiple regularization techniques to tackle distrubution shifts. Specific techniques (among augmentations) that are used in this benchmark are:
51+
52+
- [Spectral decoupling](https://arxiv.org/abs/2011.09468)
53+
- [Label Smoothing](https://arxiv.org/abs/1512.00567)
54+
- [Spatially Varying Label Smoothing](https://arxiv.org/abs/2104.05788)
55+
56+
#### Pre-trained backbone encoders
57+
58+
All the models are trained/fine-tuned with an IMAGENET pre-trained backbone encoder that is naturally reported.
59+
60+
#### Training Hyperparams
61+
62+
All the training hyperparameters are naturally reported.
63+
64+
#### Other Notes
65+
66+
Note that even if these benchmarks are not SOTA or differ from the original manuscripts, the reason for that are likely not the model-architecture or the post-processing method (since these are the same here) but rather the model weight initialization, loss-functions, training hyperparameters, regularization techniques, and other training tricks that affect the model performance.
67+
68+
## Baseline models
69+
70+
### <span style="color:green">Cell/Nuclei-segmentation</span>
71+
72+
#### Results Pannuke
73+
74+
##### Training Set-up
75+
76+
| Param | Value |
77+
| ---------------------- | ----------------------------------------- |
78+
| Optimizer | [AdamP](https://arxiv.org/abs/2006.08217) |
79+
| Auxilliary Branch Loss | MSE-SSIM |
80+
| Type Branch Loss | Focal-DICE |
81+
| Encoder LR | 0.00005 |
82+
| Decoder LR | 0.0005 |
83+
| Scheduler | Reduce on plateau |
84+
| Batch Size | 10 |
85+
| Training Epochs | 50 |
86+
| Augmentations | Blur, Hue Saturation |
87+
88+
#### Results Lizard
89+
90+
##### Training Set-up
91+
92+
Same as above.
93+
94+
##### Patching Set-up
95+
96+
##### Sliding-window Inference Hyperparams

0 commit comments

Comments
 (0)