Comments on Guidelines 1 and 4 from the Inductive Bio team #5

alexanderrich · 2024-11-19T00:02:28Z

alexanderrich
Nov 19, 2024

Hi Polaris team, I’m sharing these comments on behalf of myself and the team at Inductive Bio, including @gunjanbaid, @pmaher86, @BryceEakin, and @benb111.

We’re excited to see the first paper take shape from the Polaris collaboration. These types of guideline papers can be really valuable for a field. We particularly loved the focus on relating model performance to specific decisions, looking at effect size, and considering the role of dynamic range in perceived model performance. It’s also fantastic to see a paper taking this approach of collecting community comments before going to publication.

We wanted to share two comments related to Guideline 1 (Performance sampling distribution) and one related to Guideline 4 (presenting the result).

On Guideline 1, we’d like to see a stronger statement about the importance of choosing an appropriate non-random data-splitting approach.
On Guideline 1, we are skeptical of the choice of 5x5 repeated cross-validation, and would like to see either a simpler CV approach recommended or a more rigorous defense of the added complexity of 5x5 repeated CV.
On Guideline 4, we’d like to propose that a scatterplot of predicted vs measured values for the top-performing model be included in the recommended visualizations.

We’ve shared some in depth notes below - we hope these are helpful as the paper progresses into the peer review process, and are always happy to have continued discussion either via Github or other venues!

Importance of appropriate non-random splitting approach
The manuscript currently addresses alternative (non-random) splitting approaches in the section “3.1.3 Cross-validation with advanced splits,” and indicates that this will be the focus of a future manuscript. We understand that it can be impossible to cover everything in one paper, but we think there should be a stronger disclaimer about how important this issue is.

To our minds, the use of random and scaffold splitting is one of the most important reasons that modeling results in the small-molecule ML literature can be so misleading. As we know this group of authors is aware, random and scaffold splitting can leave highly similar pairs of molecules split across train and test, and this often leads to performance metrics on a test set that are much higher than can be expected on truly novel compounds. We’ve written about this a bit here (but certainly aren’t the first).

Readers coming from the broader non-chemistry ML field may already have a good understanding of cross-validation and statistical testing but not the highly chemistry-specific issues with random splitting of datasets and how to address them. Without a clear disclaimer, there’s a risk that by following the guidelines “to a T”, practitioners could form a misplaced, statistically-strengthened confidence in how well their models are performing, only to have their models fail in practice due to their reliance on a misleading splitting approach.

Our thought on how to include this disclaimer would be directly in Guideline 1, e.g. (added text shown in bold):

We recommend using a 5x5 repeated cross-validation procedure to sample the performance distribution. This procedure suits typical dataset sizes used in small molecule property modeling (e.g., 500 - 100,000). The training set can be further split into a training and validation set if needed. Care should be taken to consider how the choice of data-splitting approach might systematically over-estimate or under-estimate model performance.

And then adding another paragraph or two to section 3.1.3 that specifically cites papers such as Sheridan (2013), Landrum et al. (2023), and Guo et al. (2024) so readers are aware of potential issues and can familiarize themselves with proposed solutions.

Questions and comments regarding 5x5 repeated CV

Compared to 5-fold CV, the recommendation of 5x5 repeated CV represents a 5x increase in computational complexity for all experiments run in small-molecule ML, and is also a divergence from standard practice in other ML subfields. We’d like to see either a stronger justification for it or a simpler method proposed. Our comment on 5x5 repeated CV boils down to two questions: (1) is it possible that standard 5-fold CV, while imperfect, is “good enough” for model comparison? (2) if 5-fold CV is not good enough, is 5x5 repeated CV the best alternative?

CV Question 1: is 5-fold CV “good enough”?

The guidelines manuscript points to an interesting paper by Bates et al. that dives into the tendency for standard K-fold CV to have poor coverage in its estimates of model performance. But it’s unclear how damning this really is for standard K-fold CV as a tool for model comparison. The Bates paper notes that they are specifically interested in evaluating the error distribution of a particular trained model (what they call ErrXY) versus evaluating the differences between two types of models.

For small-molecule ML papers, readers are usually most interested in differences between models - e.g., they’re more interested in whether Model Type A is better than Model Type B at predicting solubility than on the exact error rate on the test set (in part because practitioners know that exact error rates will probably be different for their project due to data distribution shifts). Bates et al. note that this is a distinct use case and refer to this recent paper by Wager. The Wager paper performs an analysis showing that while standard CV is quite poor at estimating the error distribution of a single model, it’s actually quite good at telling you which of two models is better.

Based on this reading of the literature, it seems like standard CV may still be “good enough” as a tool to compare models, particularly if a standardized set of folds are published for each dataset so that all researchers can perform consistent comparisons. To convince the field that standard CV is not good enough, we’d like to see analyses showing it does a poor job of picking the best choice among competing models, rather than that it has poor coverage for the error rate of a single model.

CV Question 2: if 5-fold CV is not “good enough,” is 5x5 CV the right alternative?

If 5-fold CV is shown to be not good enough, then we’d like to see more extensive comparison of alternatives before determining a new standard approach for the field. A few possibilities to consider are:

Nx5 CV, for N other than 5: Why is 5 the right number? Why not only 2, saving some computation? Or why not 10, providing more accurate results?
Nested CV: The Bates Nested CV procedure is stated to be too computationally intensive, but it seems like one could choose the number of random seeds and folds to use, and lower these to fit a given computational budget. If NCV is run with the same computational budget as 5x5 CV, does it perform better or worse?
K replicates: What about performing 5, 10, or 25 replicates of randomly 80-20 splitting the dataset? This is the direction that the popular ChemProp package is going. The guidelines manuscript says “Commonly used alternatives to CV like bootstrapping and repeated random splits of the data have also been shown to result in strong dependency between samples and are generally not recommended,” citing the Bates paper, but we couldn’t find anything in the Bates paper saying these approaches would work worse than K-fold CV or 5x5 CV.

To test these alternatives, we’d like to see more thorough testing than is currently provided in the supplementary analysis. Some things we think could strengthen the supplemental analysis are:

More focus on how well alternative CV approaches measure differences between methods, vs how well they measure the absolute performance of a given method. As the Bates and Wager papers make clear these two goals don’t always guide one to the same methods, and the latter goal seems most important for small-molecule ML papers.
A greater number of datasets on which to evaluate the alternative methods. The Bates paper presents several simulations and real-world datasets to support their NCV approach. The guidelines paper looks like it uses one dataset, and it was a bit tough to tell what the dataset actually is.
More focus on evaluating method coverage over many splits of the dataset, as shown in the Figure 3 analysis. We found this a lot more convincing than the Figure 1 and 2 analyses, where the use of a single train/test split means there is a large role of chance in whether different methods’ CIs do or do not cover the observed test set metric.
A better theoretical understanding of how the methods are performing and why. The supplemental notes that NCV actually has pretty poor coverage in Figure 3, which is surprising because it’s also posited to be the most rigorous, state-of-the-art method. It would create more trust in the evaluation if either the most statistically rigorous method performed well, or there was a clearer explanation for why it didn’t.

The choice of the best cross-validation approach is a really difficult one, and it will become even more difficult when moving outside of random splitting. In a dataset containing timestamps to enable time splitting, or in a dataset consisting of several distinct drug programs that should be separated between train and test, it is not immediately clear how one would apply 5x5 repeated CV. Given this, our recommendation is to bias towards simplicity and towards CV approaches that other subfields of ML have aligned on. However, we hope the above questions will be helpful for strengthening the Guideline 1 section of the paper no matter what direction is chosen.

Inclusion of a predicted vs measured scatterplot in reporting results

We think the guidelines should recommend that a scatterplot of predicted vs measured values be included in paper results for at least the best-performing model, on either a single test-set split or on the aggregated test sets from cross-validation, and using an appropriate scale for the property, e.g. log scale for many chemical properties. While performance metrics and statistical tests are immensely useful, there are aspects of the model’s performance that are missed by these summaries but are made clear in this more “raw” representation of the model’s performance (consider the classic Anscombe’s quartet). As a few examples: are there notable outliers, or even a cluster of outliers? Is there a meaningful nonlinearity between predicted and measured values? Is there heteroskedasticity in the predictions? Do the predictions show low dynamic range relative to the measured values?

Figure 3 of the guidelines preprint is a great example of how useful this kind of scatterplot can be. The figure makes immediately obvious that the experimental data is highly concentrated between 100 and 300µM, in a way that none of the metrics show. Seeing this raises relevant follow-up questions, like “do model predictions around 1µM tend to have larger errors than predictions around 100µM?” We think that recommending that papers show these sorts of scatterplots as standard practice will strengthen the field.

cwognum · 2024-11-19T00:55:45Z

cwognum
Nov 19, 2024
Maintainer

Hi Inductive Bio team, love the detailed and constructive feedback! Thank you for taking the time to write this up.

On Guideline 1, we’d like to see a stronger statement about the importance of choosing an appropriate non-random data-splitting approach.

I can be brief here: I think this is an excellent suggestion!

On Guideline 4, we’d like to propose that a scatterplot of predicted vs measured values for the top-performing model be included in the recommended visualizations.

Same here! I think this is a great suggestion too. For classification problems, it would instead be a confusion matrix.

I'll respond to your last suggestion separately so we have a separate thread to discuss it in more detail!

1 reply

alexanderrich Nov 19, 2024
Author

excellent! thanks for the quick response Cas. Glad to hear we're well aligned here

cwognum · 2024-11-19T01:26:25Z

cwognum
Nov 19, 2024
Maintainer

On Guideline 1, we are skeptical of the choice of 5x5 repeated cross-validation, and would like to see either a simpler CV approach recommended or a more rigorous defense of the added complexity of 5x5 repeated CV.

This is an interesting one! @jrash and myself actually discussed this as we were writing a first draft of the paper. Three thoughts come to mind here.

I don't agree that we're just interested in the relative differences between models. Having a more accurate indication of variance in performance is an important piece of information to build trust with the teams that are actually responsible for deploying these models in production.
I also don't fully agree with the analogy with other ML subfields, because the datasets in small molecule drug discovery are of a different nature (smaller, heterogeneous, imbalanced, ...). I would expect a higher variance in performance on different folds, which may make a more sophisticated CV method necessary. Nevertheless, I see the value in confirming that hypothesis somehow.
Most datasets in small-molecule drug discovery are small (~1000 datapoints), so I think the added computational complexity is manageable. It may mean that you have to shift some of your computational budget from exhaustive hyper-parameter searches to cross-validation, but I think that is a good thing.

Does that make sense? Do you agree?

The choice of the best cross-validation approach is a really difficult one, and it will become even more difficult when moving outside of random splitting. (...) Given this, our recommendation is to bias towards simplicity and towards CV approaches that other subfields of ML have aligned on.

This is an interesting argument that reminds me of Sutton's The Bitter Lesson.

On alternatives to 5x5 CV, I'll let @jrash comment!

1 reply

alexanderrich Nov 19, 2024
Author

These are great points. I'll take a stab at a response, and other Inductive folks feel free to jump in if you have other perspectives. I think at a basic level we're in agreement about the facts, that 5-fold CV has some imperfections as a statistical tool and that for some expenditure of conceptual complexity and computation you can get better statistical results. We're just falling in different places on the spectrum of whether that expenditure is worth it.

I don't agree that we're just interested in the relative differences between models. Having a more accurate indication of variance in performance is an important piece of information to build trust with the teams that are actually responsible for deploying these models in production.

Yeah, you're definitely right that we don't just care about relative differences in models. In specific applied use cases the the absolute performance of a model matters a lot. In something like an ML methods paper that is evaluating on a benchmark dataset though (which seems like polaris/the guidelines can have the biggest impact) I lean pretty heavily towards focusing on differences between models, since it's usually safe to assume that the performance you're seeing in that dataset won't replicate in your actual drug program exactly.

I also don't fully agree with the analogy with other ML subfields, because the datasets in small molecule drug discovery are of a different nature (smaller, heterogeneous, imbalanced, ...). I would expect a higher variance in performance on different folds, which may make a more sophisticated CV method necessary. Nevertheless, I see the value in confirming that hypothesis somehow.

Small molecule ML datasets definitely have a different feel to the huge image and language datasets one sees these days, but I actually think small-molecule ML datasets are a lot like the ML datasets people used to work on before the deep learning revolution. E.g., if you look at the stuff in the UCI ML repository, lots of these datasets are small, heterogenous, and imbalanced too. So there may be lessons from how the rest of the ML field evolved.

Most datasets in small-molecule drug discovery are small (~1000 datapoints), so I think the added computational complexity is manageable. It may mean that you have to shift some of your computational budget from exhaustive hyper-parameter searches to cross-validation, but I think that is a good thing.

Agreed the computation would generally be pretty manageable at ~1000, though I do think it could become pretty gnarly somewhere between 1K and the 100K threshold at which the guidelines currently suggest switching away from 5x5 CV.

Does that make sense? Do you agree?

The choice of the best cross-validation approach is a really difficult one, and it will become even more difficult when moving outside of random splitting. (...) Given this, our recommendation is to bias towards simplicity and towards CV approaches that other subfields of ML have aligned on.

This is an interesting argument that reminds me of Sutton's The Bitter Lesson.

Love the Sutton shoutout! I think that's a good analogy. My impression is that in the pre-deep-learning days people spent a lot of time developing and publishing different CV-type approaches, but ultimately most people landed on doing something simple like 5-fold CV, and eventually the datasets just got bigger and folks stopped worrying about the topic as much. And to me - though this is very subjective - it seems like the simple approaches were probably good enough for the the field to grow through its early phases, so they're likely to be for small-molecule ML too.

jrash · 2024-12-17T02:11:48Z

jrash
Dec 17, 2024
Collaborator

Thanks for taking the time to read the paper carefully and to provide extensive feedback, we appreciate it!

First, there are several questions about alternate approaches. I wanted to be clear on the goal of the paper. We wanted to provide a clear set of recommendations that we expect to work well in general. We also wanted to avoid listing off options, which could further confuse the reader. Instead, we state that you may decide to deviate based on your needs, and that is fine as long as you are transparent about what you are doing.

There is a lot to unpack here, but let me try to answer the two main questions you’re asking.

(1) is 5-fold CV “good enough”?

We argue in the paper that it is not. We use CV to sample performance distributions, which we then compare with statistical tests. For such a use case, 25 samples is a minimum sample size that is typically recommended for a t-test to have good performance in terms of type I and II error rate. The Tukey HSD test we recommend is an extension to the t-test.

Clearly 5 samples is not enough for a statistical test, but maybe you are saying statistical testing is not necessary? If so, that is a separate discussion. We discuss in Section 2 why statistical testing is necessary for drug discovery data sets.

(2) if 5-fold CV is not “good enough,” is 5x5 CV the right alternative?

We believe it is, in general. We described this in Section 3.1.2, but to summarize the key points:

25-fold vanilla CV or repeated splits (i.e. K replicates) introduces too much dependence between the various samples because the train and/or test sets overlap more. This ultimately leads to more type I and type II errors in the proposed method comparison protocol because the assumption of the tests are violated. I responded on repeated splits in detail here: Why is repeated random sampling insufficient? #9
Nested CV is too computationally expensive. Bates et al. note in the paper (section F.8) that their procedure requires at least 200 repetitions, which is similar to our experience based on the experiments we’ve run. It is also not compatible with most popular performance metrics in ML.
Repeated cross validation strikes a good balance between statistical power and ease of use. The intuition here is straightforward. The 25 samples you get from 5x5 give you enough independent samples to perform a proper statistical test. The procedure is a simple extension of the common 5x2 recommendation for drug discovery sized data sets (https://arxiv.org/pdf/1811.12808).

With respect to the added complexity, 5x5 CV is 2.5 times 10 fold CV, which is also common practice. This is a reasonable compute increase to obtain statistical rigor. We think it is worth reducing hyperparameter search or training epochs to save compute budget for a rigorous methods comparison.

That said, we do appreciate that this is a departure from current practices and will take some getting used to, which is why we provided code examples to make the transition as easy as possible. We also will touch upon this again in our upcoming paper on generalization and splitting.

I hope that helps! What I take away from this is that we can improve the writing to bring these points across more clearly. We are open to suggestions.

I will address some other questions in detail in follow up posts.

0 replies

jrash · 2024-12-17T02:15:18Z

jrash
Dec 17, 2024
Collaborator

Bates et al. note that this is a distinct use case and refer to this recent paper by Wager. The Wager paper performs an analysis showing that while standard CV is quite poor at estimating the error distribution of a single model, it’s actually quite good at telling you which of two models is better.

The Wager paper is on the asymptotic consistency of CV to estimate performance differences between models. It doesn’t go into estimating uncertainty of performance estimates or statistical testing, and we need to be careful drawing conclusions about the method comparisons workflow in our paper.

Bates talks about the Wager paper in their asymptotics section. They say: “Wager (2020) show that for CV, comparing two models is a statistically easier task than estimating the prediction error, in some sense.” Wager shows that the difference in model performance in CV is a consistent estimator of the performance difference on a new test set, meaning that the estimator is unbiased and converges to the test set difference as sample size increases. Interestingly, CV is not consistent for individual model performances. This makes inference for differences simpler in some ways (in terms of deriving confidence intervals, statistical tests etc).

The Wager paper stops at asymptotics and does not go into statistical testing. It is certainly not saying that 5 fold CV is good enough for statistical tests. With a typical drug discovery data set size and a small number of CV samples you can still easily see performance differences by chance. If Wager were to follow the progression of the Bates paper and derive a nested CV test, that would likely require 200 repeats of nested CV as Bates did.

The guidelines manuscript points to an interesting paper by Bates et al. that dives into the tendency for standard K-fold CV to have poor coverage in its estimates of model performance. But it’s unclear how damning this really is for standard K-fold CV as a tool for model comparison. The Bates paper notes that they are specifically interested in evaluating the error distribution of a particular trained model (what they call ErrXY) versus evaluating the differences between two types of models.

You are right that the Bates paper and our simulation focuses on estimates of individual model performance. We do address this in Appendix B briefly, we should probably elaborate more but were trying to keep things accessible to our audience.

Note the last sentence of the Bates paper is: “We anticipate that nested CV can be extended to give valid confidence intervals for the difference in prediction error between two models.” Results for the comparison of methods in a t-test framework are actually implied by their paper. The paired t-test statistic is computed using the variance of the performance difference between models, which will be underestimated if the variance of the individual performances is underestimated. Our Tukey HSD procedure is an extension to the paired t-test.

The results from the Bates paper (and our simulation) show that the variance is underestimated for individual methods by standard CV. This implies that the variance for the difference between models will also be underestimated, resulting in a poor statistical test with elevated type I error rate. Since underestimation of variance is the main problem for statistical testing with CV, the Bates approach should lead to a valid statistical test.

A statistical test based on the Bates approach has not been derived yet. The focus on the simulation was to show that repeated CV can be used to approximate the results of Bates. Since statistical testing has been worked for repeated CV and this can be generalized easily to different performance metrics and splitting methods, we proceed with repeated CV.

Since the testing results are implied by our simulation, I didn’t run a testing simulation. But perhaps we should run it to make things concrete for readers.

I also thought we had more discussion of the above points in the simulation write up, but I guess that got cut at some point. I will add some more discussion there and maybe some to the main text.

In a dataset containing timestamps to enable time splitting, or in a dataset consisting of several distinct drug programs that should be separated between train and test, it is not immediately clear how one would apply 5x5 repeated CV.

Drug program splitting: Our example in the paper demonstrates handling clustered data where clusters are defined by chemical scaffold. The same approach can be applied to grouping by drug program, though I would probably group by scaffold or cluster.

Time splitting: The main focus of this paper is on benchmarking in the literature. Most benchmarking data sets in the public domain are not time stamped, but we often do have timestamps in industry. The time split case is a special case where you might want to deviate from the guidelines somewhat. We have an upcoming paper on splitting where we will address this. For statistical testing, the key is to get at least 25 approximately independent samples that capture the data structure of interest. You can do this with a rolling time split and further divide splits into folds for additional samples, for example.

0 replies

jrash · 2024-12-17T02:18:52Z

jrash
Dec 17, 2024
Collaborator

Thanks for the simulation suggestions. There were some good ideas in there. However, I do want to clarify that our goal is not to write a comprehensive simulation paper like Bates et al. did. We are primarily relying on others who have already run such experiments in the literature for statistical testing with repeated cross validation (cited below). The simulation is more intended to show how results translate to a drug discovery data set.

More focus on how well alternative CV approaches measure differences between methods, vs how well they measure the absolute performance of a given method.

Addressed in the previous post.

More focus on evaluating method coverage over many splits of the dataset, as shown in the Figure 3 analysis. We found this a lot more convincing than the Figure 1 and 2 analyses, where the use of a single train/test split means there is a large role of chance in whether different methods’ CIs do or do not cover the observed test set metric.

Figure 1 and 2 are intended to demonstrate characteristics of the intervals in a way that is easy for the reader to visualize and understand. Figure 3 is the main result.

A better theoretical understanding of how the methods are performing and why. The supplemental notes that NCV actually has pretty poor coverage in Figure 3, which is surprising because it’s also posited to be the most rigorous, state-of-the-art method. It would create more trust in the evaluation if either the most statistically rigorous method performed well, or there was a clearer explanation for why it didn’t.

It is interesting that for some metrics Bates didn't have the target coverage. In the end, the paper isn’t proposing nested CV, though. We only compare against it because it is the current SOTA. We are proposing repeated CV and we show we can replicate the Bates results with this method so we didn’t look into it further. Maybe if we have time. We provide the code for the simulation.

Also, Ill add more details on the data set that was an oversight.

References:

Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 1895–1923 (1998).
Nadeau, C., & Bengio, Y. (1999). Inference for the generalization error. Advances in neural information processing systems, 12.

0 replies

alexanderrich · 2025-01-03T15:47:47Z

alexanderrich
Jan 3, 2025
Author

Hi @jrash, apologies for the late response here! This got lost in the shuffle of end-of-year busyness. I appreciate the thorough and thoughtful responses to our comments.

If there's one thing on the cross-validation side that I'd still love a deeper dive into in the final paper, it's why 5x5 CV is expected to work much better than repeated random splitting. This is essentially the same question raised by @JacksonBurns in this question: #9.

If i'm following the 5x5 method right, then within each 5x CV there is no overlap across test sets, but across the different CV sets there is overlap. And then, to my understanding, the 25 sets of metrics are pooled together for statistical analysis. So it seems like for any given sample you have 4 other samples where you achieve independence, but then 20 samples where you do not have any more independence than you would have in 25x random re-sampling of the data. It's not obvious to me why that would lead to much better statistical behavior.

This is a totally intuitive argument, for which i've done no formal analysis, but if there was a way for this to be made clear and compelling in the paper, I think that would be a great addition!

2 replies

jrash Jan 21, 2025
Collaborator

Hi @alexanderrich,

In #9 we discuss this paper: Dietterich, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 1895–1923 (1998). Your questions are addressed in depth by that paper, see Figure 5 in particular. I think it is probably sufficient to cite this out. But we may need to stress the results of this paper more in ours.

In 5x5 CV you have 5 sets of 5 samples with independent test sets. This is actually a substantial increase in independence over repeated sampling. If you visualized this as a 25x25 matrix where the CV samples are on the columns and rows and the squares showed amount of overlap in test set, you would have 5 5x5 squares on the diagonal where there would be zero overlap, which is 20% of the matrix.

There is dependence that still exists between the 5 repeats of CV (same as repeated sampling), but Deiterrich show that there is sufficient independence to obtain acceptable false positive rates with this procedure. We show similar results in our simulation. You will never have completely independent samples. You just need samples to be independent enough to obtain reasonable false positive rates.

alexanderrich Jan 24, 2025
Author

thanks @jrash, appreciate the overview here and will have to dive deeper into the Dietterich paper as well!

Comments on Guidelines 1 and 4 from the Inductive Bio team #5

Uh oh!

alexanderrich Nov 19, 2024

Replies: 6 comments · 4 replies

Uh oh!

Uh oh!

cwognum Nov 19, 2024 Maintainer

Uh oh!

alexanderrich Nov 19, 2024 Author

Uh oh!

cwognum Nov 19, 2024 Maintainer

Uh oh!

alexanderrich Nov 19, 2024 Author

Uh oh!

jrash Dec 17, 2024 Collaborator

Uh oh!

jrash Dec 17, 2024 Collaborator

Uh oh!

jrash Dec 17, 2024 Collaborator

Uh oh!

Uh oh!

alexanderrich Jan 3, 2025 Author

Uh oh!

Uh oh!

jrash Jan 21, 2025 Collaborator

Uh oh!

alexanderrich Jan 24, 2025 Author

alexanderrich
Nov 19, 2024

Replies: 6 comments 4 replies

cwognum
Nov 19, 2024
Maintainer

alexanderrich Nov 19, 2024
Author

cwognum
Nov 19, 2024
Maintainer

alexanderrich Nov 19, 2024
Author

jrash
Dec 17, 2024
Collaborator

jrash
Dec 17, 2024
Collaborator

jrash
Dec 17, 2024
Collaborator

alexanderrich
Jan 3, 2025
Author

jrash Jan 21, 2025
Collaborator

alexanderrich Jan 24, 2025
Author