Confusion matrix-based metrics #1660

chrico-bu-uab · 2024-10-21T16:09:45Z

Since the existing frameworks don't allow for confusion matrix-based evaluators or optimization, I created additional evaluator and optimizer classes (Confusion and MCCBootstrapFewShotWithRandomSearch, respectively).

These additions address #556.

Confusion matrices and MCC

Fixes

…still the default).

add tiebreaker options

stevegbrooks · 2025-03-05T04:39:31Z

Very much support this PR! How do we get attention from a maintainer?

chrico-bu-uab · 2025-03-05T14:59:36Z

Have you tried from dspy import maintainer_attention?

okhat · 2025-03-05T15:36:32Z

Hey @chrico-bu-uab , what is this PR trying to achieve? It seems like it's a special case of evaluation where you need to evaluate the whole set at once, not each item? If so, how would that interact with optimization?

We typically recommend handling special cases like this in user code. Just make each dspy.Example carry a set of examples and handle that in your metric and your program. Then the normal dspy.Evaluate works just fine, and so do all optimizers.

okhat · 2025-03-05T15:36:45Z

(looks like importing maintainer_attention works)

chrico-bu-uab · 2025-03-05T16:23:49Z

I haven't tried making each example instance carry a set of examples yet. I still want separate predictions for each example, though, and to potentially parallelize their evaluations. It's just the metrics that need to be calculated all at once.

I'm not sure I understand the question about optimization. I mimicked the Evaluator and BootstrapFewShotWithRandomSearch classes pretty exactly (I think), so however those two interact with optimization should have transferred over to my classes.

I just thought confusion matrices are required often enough to warrant a dedicated framework, or even be the default metric. I rarely use straight accuracy for any ML. But I may be old-school.

I appreciate the feedback and understand if having the user handle this instead makes more sense for the community.

stevegbrooks · 2025-03-06T14:05:21Z

Hi @okhat - also thanks for that explanation.

Would you be able to post some example code to show how to do this: "Just make each dspy.Example carry a set of examples and handle that in your metric and your program. Then the normal dspy.Evaluate works just fine, and so do all optimizers."?

piemasbi · 2025-03-06T16:12:59Z

Thanks for all the nice posts, and thanks to @chrico-bu-uab for the code with the new classes.

If I understand correctly, the current optimizers use the mean of the metric value (which is computed on a per-example basis, let's call it the "sample metric") over the train/validation set for the optimization. What would be nice to have would be to have the optimizers computing other summary statistics on the sample metrics (for example, sensitivity, precision, f1-score, etc).

I see the issue with this, since it is not always meaningful to have summary metrics other than the mean for many sample metrics (think about the semantic f1 score, for example). Still, it would make sense to support optimizing for other summary metrics, especially when datasets are imbalanced.

Hope this makes sense, otherwise please correct me, and helps to pinpoint the issue.

stevegbrooks · 2025-04-02T07:27:23Z

Hi @okhat - also thanks for that explanation.

Would you be able to post some example code to show how to do this: "Just make each dspy.Example carry a set of examples and handle that in your metric and your program. Then the normal dspy.Evaluate works just fine, and so do all optimizers."?

Hi @okhat just pinging you on this. Is this something you have offhand? Because I dont see any obvious way to implement it.

Thanks!

chrico-bu-uab added 9 commits October 19, 2024 09:59

Add files via upload

d7d01dc

Confusion matrices and MCC

Update confusion.py

7372e78

Fixes

class-weighted confusion/mcc

31586c9

fixes

2848513

Took out incorrect support calc

5666f88

Fixed confusion matrix/MCC class weighting and made it optional (but …

7f5f61c

…still the default).

Slight cleanup

bb2e3e0

Error fix

0509421

Slight fix

2e4bbd6

chrico-bu-uab marked this pull request as draft October 22, 2024 14:09

Slight fix

3dd758d

chrico-bu-uab marked this pull request as ready for review October 22, 2024 15:26

chrico-bu-uab added 18 commits October 24, 2024 18:20

flexible output name

163735f

tweaks

4b69804

tweaks

594ef27

tweaks

3e09b43

tweak

628dd34

error fix

92fd90d

tweak

95ef271

small efficiency tweaks

192c7d8

whitespace

af3c088

infer labels from classes in devset

e3e6799

whitespace

f76af39

added matching options besides first

e49c2ff

slight rephrasing of _extract

07f164c

removed redundant if statement

42548d1

Merge branch 'stanfordnlp:main' into main

abceb35

Update confusion.py

c4049db

add tiebreaker options

to have claude look at

5e7747a

changes to reflect latest main branch

d2443ec

chrico-bu-uab added 2 commits March 4, 2025 12:09

minor changes

f060865

oops

d74185f

reduce magnitude of mcc by the proportion of missing values

d577138

chrico-bu-uab added 7 commits April 15, 2025 22:31

Merge branch 'stanfordnlp:main' into main

6680f0d

Merge branch 'stanfordnlp:main' into main

21fdcc1

Merge branch 'stanfordnlp:main' into main

39c76f8

Merge branch 'stanfordnlp:main' into main

23eefdf

Merge branch 'stanfordnlp:main' into main

85f4160

Merge branch 'stanfordnlp:main' into main

7421392

Merge branch 'stanfordnlp:main' into main

9920d3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Confusion matrix-based metrics #1660

Confusion matrix-based metrics #1660

Uh oh!

chrico-bu-uab commented Oct 21, 2024 •

edited

Loading

Uh oh!

stevegbrooks commented Mar 5, 2025

Uh oh!

chrico-bu-uab commented Mar 5, 2025 •

edited

Loading

Uh oh!

okhat commented Mar 5, 2025

Uh oh!

okhat commented Mar 5, 2025

Uh oh!

chrico-bu-uab commented Mar 5, 2025

Uh oh!

stevegbrooks commented Mar 6, 2025

Uh oh!

piemasbi commented Mar 6, 2025

Uh oh!

stevegbrooks commented Apr 2, 2025

Uh oh!

Uh oh!

Confusion matrix-based metrics #1660

Are you sure you want to change the base?

Confusion matrix-based metrics #1660

Uh oh!

Conversation

chrico-bu-uab commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevegbrooks commented Mar 5, 2025

Uh oh!

chrico-bu-uab commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

okhat commented Mar 5, 2025

Uh oh!

okhat commented Mar 5, 2025

Uh oh!

chrico-bu-uab commented Mar 5, 2025

Uh oh!

stevegbrooks commented Mar 6, 2025

Uh oh!

piemasbi commented Mar 6, 2025

Uh oh!

stevegbrooks commented Apr 2, 2025

Uh oh!

Uh oh!

chrico-bu-uab commented Oct 21, 2024 •

edited

Loading

chrico-bu-uab commented Mar 5, 2025 •

edited

Loading