Clarification Needed: OOD vs. ID as Positive Class in Evaluation Metrics #289
-
Hi everyone, We're currently working on OOD detection and are encountering some confusion regarding the convention of defining the positive class (i.e., whether OOD or ID samples are labeled as '1') during evaluation. This directly impacts the interpretation and comparability of crucial metrics like FPR95 (False Positive Rate at 95% True Positive Rate). We've observed different conventions in prominent repositories:
This discrepancy has a significant impact on the FPR95 metric. Let's clarify its definition in this context:
FPR95 specifically means the False Positive Rate when the True Positive Rate is fixed at 95%. If ID is the positive class (1):
If OOD is the positive class (1):
As you can see, these two interpretations lead to entirely different numerical values and meanings for FPR95, making direct comparison across papers or benchmarks using different conventions extremely challenging and potentially misleading. Our questions are:
We believe clarifying this point is crucial for the advancement and reproducible research in OOD detection. Thank you for your time and insights! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Thank you for this post, and indeed this is an accurate observation which we are aware of. Please see my answers below.
We intentionally choose to treat OOD as positive and ID as negative in OpenOOD v1.5 for convention/historical reason. In conventional ML (more specifically, conventional anomaly detection), it has been a standard to treat something "abnormal" as positive. This is also the practice adopted by the seminal paper for modern OOD detection on neural networks, "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks". There are also some early and well-known works that for some (unclear) reason choose the opposite setup; and later works follow them, and then there is a momentum forming up. This is why there is a discrepancy and it could be confusing.
Similar to above discussion, we (or at least I, personally) would recommend defining OOD as positive.
This is exactly why we put up OpenOOD in the first place! Besides this discrepancy in the positive class definition, from paper to paper there are so many other discrepancies in the evaluation data, experimental setup, etc. that makes straight comparison over reported numbers quite difficult. We hope that this project can ultimately motivate and lead towards a universal definition and setup for OOD evaluation. |
Beta Was this translation helpful? Give feedback.
Thank you for this post, and indeed this is an accurate observation which we are aware of. Please see my answers below.
We intentionally choose to treat OOD as positive and ID as negative in OpenOOD v1.5 for convention/historical reason. In conventional ML (more specifically, conventional anomaly detection), it has been a standard to treat something "abnormal" as positive. This is also the practice adopted by the seminal paper for modern OOD detection on neural n…