-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Optionally aggregate metrics for custom step method models in Trainer.evaluate()
#21314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR. Is there an alternative way to solve this problem? Other backends don't have this issue. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21314 +/- ##
==========================================
- Coverage 82.60% 82.55% -0.05%
==========================================
Files 565 565
Lines 54773 54825 +52
Branches 8508 8520 +12
==========================================
+ Hits 45244 45260 +16
- Misses 7439 7469 +30
- Partials 2090 2096 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@fchollet Could you elaborate? I think the issue exists independently of the backend. In terms of options, I believe we could also do something along the lines of |
Aggregation should be handled by stateful The idea being that at each test step, you update the |
@fchollet I agree that this could and should be handled by the metrics themselves. However, the current In that case, should we fall back to a conversation about extending the support for arbitrary gradient-based learning in keras? I.e. |
You can just use a Metric reduction across an epoch (or across one call to |
I did some more digging and could conclude that the issue only arises when the Closing this since it seems there is already a workaround, but I think it could be nice if keras removed the requirement for users to explicitly define an aggregation scheme for each metric and would default to using the mean. |
Currently, models that use a custom
test_step
function do not support metric aggregation inevaluate()
. Instead, the method simply returns metrics from the last step:keras/keras/src/trainers/trainer.py
Lines 1013 to 1015 in 785c9b0
This can lead to user confusion, e.g. in bayesflow-org/bayesflow#481.
This PR introduces a very primitive way to aggregate metrics, which is controllable by a boolean flag in the
evaluate
method:aggregate=False
is set toFalse
by default to preserve backward-compatibility. The current idea is to simply sum all metrics and divide by the total number of steps taken, which works when all steps return any PyTree of numerical (tensor or primitive) metrics, which must be consistent across all steps taken.I understand that this code might still be too simple, lack extensibility to multi-device synchronization, or be inconsistent with other parts of the library. I am open to modifying the PR, given feedback.