-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Confusion matrix-based metrics #1660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Confusion matrices and MCC
Fixes
…still the default).
add tiebreaker options
Very much support this PR! How do we get attention from a maintainer? |
Have you tried |
Hey @chrico-bu-uab , what is this PR trying to achieve? It seems like it's a special case of evaluation where you need to evaluate the whole set at once, not each item? If so, how would that interact with optimization? We typically recommend handling special cases like this in user code. Just make each dspy.Example carry a set of examples and handle that in your metric and your program. Then the normal |
(looks like importing maintainer_attention works) |
I haven't tried making each example instance carry a set of examples yet. I still want separate predictions for each example, though, and to potentially parallelize their evaluations. It's just the metrics that need to be calculated all at once. I'm not sure I understand the question about optimization. I mimicked the Evaluator and BootstrapFewShotWithRandomSearch classes pretty exactly (I think), so however those two interact with optimization should have transferred over to my classes. I just thought confusion matrices are required often enough to warrant a dedicated framework, or even be the default metric. I rarely use straight accuracy for any ML. But I may be old-school. I appreciate the feedback and understand if having the user handle this instead makes more sense for the community. |
Hi @okhat - also thanks for that explanation. Would you be able to post some example code to show how to do this: "Just make each dspy.Example carry a set of examples and handle that in your metric and your program. Then the normal dspy.Evaluate works just fine, and so do all optimizers."? |
Thanks for all the nice posts, and thanks to @chrico-bu-uab for the code with the new classes. If I understand correctly, the current optimizers use the mean of the metric value (which is computed on a per-example basis, let's call it the "sample metric") over the train/validation set for the optimization. What would be nice to have would be to have the optimizers computing other summary statistics on the sample metrics (for example, sensitivity, precision, f1-score, etc). I see the issue with this, since it is not always meaningful to have summary metrics other than the mean for many sample metrics (think about the semantic f1 score, for example). Still, it would make sense to support optimizing for other summary metrics, especially when datasets are imbalanced. Hope this makes sense, otherwise please correct me, and helps to pinpoint the issue. |
Hi @okhat just pinging you on this. Is this something you have offhand? Because I dont see any obvious way to implement it. Thanks! |
Since the existing frameworks don't allow for confusion matrix-based evaluators or optimization, I created additional evaluator and optimizer classes (
Confusion
andMCCBootstrapFewShotWithRandomSearch
, respectively).These additions address #556.