Skip to content

evaluate() shows incorrect metric values in progress bar vs returned results #21301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ielenik opened this issue May 18, 2025 · 1 comment
Open
Assignees
Labels

Comments

@ielenik
Copy link

ielenik commented May 18, 2025

When using model.evaluate(), the metric values displayed in the progress bar differ from the values returned by the method. There appears to be double averaging happening - the batch values are already averaged, but the progress bar shows an additional average of these averages.
Code to reproduce:

Code to reproduce:

import tensorflow as tf
import numpy as np

# Modify the model to output the same as the input
model = tf.keras.Sequential([
    tf.keras.layers.Lambda(lambda x: x)  # Lambda layer to pass input directly to output
])

# Compile the model with MAE as the metric
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Dummy data for evaluation
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=float)
y = np.zeros_like(x)  # Dummy target values

results = model.evaluate(x, y, verbose=1, batch_size=1)
print("Evaluation results:", results)

Output:

10/10 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - loss: 17.8182 - mae: 3.4545
Evaluation results: [38.5, 5.5]

Expected behavior:
The metric values shown in the progress bar should match the final returned results (or at least be clearly documented if this difference is intentional).

Issue:

Progress bar shows loss: 17.8182, MAE: 3.4545

Returned values show loss: 38.5, MAE: 5.5

The correct values should be the returned ones (38.5 and 5.5 respectively), as these match manual calculations

The progress bar seems to be averaging already-averaged batch values

Environment:
TensorFlow 2.19

Additional notes:
This discrepancy can be confusing for users who rely on the progress bar metrics during evaluation.

@sonali-kumari1
Copy link
Contributor

Hi @ielenik -

I have tested your code with the latest version of keras(3.9.2) and tensorflow(2.19.0) in this gist and I was able to reproduce the mismatch between the metric values shown in the progress bar and values returned byevaluate() method. However, when I tested with keras(2.15.0) and tensorflow(2.15.0) using this gist, the results were consistent between the progress bar and final evaluation output. We will look into this and update you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants