Relevant article: Does cross-validation work in telling rankings apart? [10.1007/s10100-024-00932-1] #4

cwognum · 2024-11-13T14:54:31Z

cwognum
Nov 13, 2024
Maintainer

I'm posting this on behalf of Károly Héberger, who doesn't use Github.

The preprint [DOI: 10.26434/chemrxiv-2024-6dbwv-v2] touches an important aspect, but the protocol is far from being perfect.

Our paper [ https://doi.org/10.1007/s10100-024-00932-1 ] examined the three relevant statistical tests (Wilcoxon, Dietterich and Alpaydin) for cross-validation. We established that Dietterich is the worst option, and none of the test performs well in first kind error situations. Seven criteria provided the unambiguous superiority of Wilcoxon test in second kind of error situations.

I am not familiar with Github discussions. Anyway, it needs to be "sign up" and I resist. I would prefer zoom or team discussion with properly prepared discussion partners.

As we elaborated all (main) scenarios, the new perspectives are obvious. Known statistical solutions (e.g. Mallows model) are not better. Practical examples are also convincing.

As all three tests do not reject H0 in type I error situations, elaboration of a new test is warranted, etc.

cwognum · 2024-11-13T14:55:17Z

cwognum
Nov 13, 2024
Maintainer Author

@jrash I think you would be best suited to answer this one.

0 replies

mnarayan · 2024-11-16T14:20:46Z

mnarayan
Nov 16, 2024

@cwognum New to this space, but I am very supportive of considering this issue. I've worked on problems of rank aggregation in biological network science and using relevant tests here will definitely matter. Would love to be a part of the discussions and happy to prepare material I'm familiar with as well. Hoeffding's D and recent discussion on nonparametric tests of dependence (using generalizations of ranking) seem relevant to what might be optimal.

0 replies

jrash · 2024-11-24T02:52:51Z

jrash
Nov 24, 2024
Collaborator

The paper shows that Deitterich’s 5X2 test had near 0 power for simulated cases using a performance metric for ranking the authors have developed called Sum of Ranking Differences.

The simulations were run with sample sizes of n = 7, 13, and 32. These samples are unusually low for drug discovery. Typical data sets are in the thousands at least. They provide an example from analytical chemistry. I would be surprised if you were to find a significant difference using a CV procedure with a data set that small. In general I would not recommend using a CV procedure with a data sets of this size. This does raise a useful issue though @cwognum. Perhaps we should advise against performing CV based statistical testing if data sets are too small. I hadn’t considered people might try this with a data set this small…

Also, I noticed that the authors followed Deitterich’s implementation exactly. He advocates for using the difference from one repeat, instead of taking the difference of means across all repeats as is typically done in a t-test. This could also lead to loss of power. This approach isnt used by other CV based tests. When developing chemmodlab, we found this to be unconventional, so we perform a repeated measures ANOVA and Tukey HSD in the standard way with all samples. We may want to stress this difference from Deitterich's t-test somewhere. Even though this is somewhat implied by the recommended tests.

@mnarayan I am not aware these types of rank aggregation metrics being used commonly in drug discovery. We expect our procedure to work for most commonly used metrics in drug discovery, but I could see how these rank aggregation cases (the number of variables is much larger than samples) could become difficult. Good to be aware of.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relevant article: Does cross-validation work in telling rankings apart? [10.1007/s10100-024-00932-1] #4

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Relevant article: Does cross-validation work in telling rankings apart? [10.1007/s10100-024-00932-1] #4

Uh oh!

cwognum Nov 13, 2024 Maintainer

Replies: 3 comments

Uh oh!

cwognum Nov 13, 2024 Maintainer Author

Uh oh!

mnarayan Nov 16, 2024

Uh oh!

Uh oh!

jrash Nov 24, 2024 Collaborator

cwognum
Nov 13, 2024
Maintainer

cwognum
Nov 13, 2024
Maintainer Author

mnarayan
Nov 16, 2024

jrash
Nov 24, 2024
Collaborator