Skip to content

evaluate() shows incorrect metric values in progress bar vs returned results #21301

Closed
@ielenik

Description

@ielenik

When using model.evaluate(), the metric values displayed in the progress bar differ from the values returned by the method. There appears to be double averaging happening - the batch values are already averaged, but the progress bar shows an additional average of these averages.
Code to reproduce:

Code to reproduce:

import tensorflow as tf
import numpy as np

# Modify the model to output the same as the input
model = tf.keras.Sequential([
    tf.keras.layers.Lambda(lambda x: x)  # Lambda layer to pass input directly to output
])

# Compile the model with MAE as the metric
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Dummy data for evaluation
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=float)
y = np.zeros_like(x)  # Dummy target values

results = model.evaluate(x, y, verbose=1, batch_size=1)
print("Evaluation results:", results)

Output:

10/10 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - loss: 17.8182 - mae: 3.4545
Evaluation results: [38.5, 5.5]

Expected behavior:
The metric values shown in the progress bar should match the final returned results (or at least be clearly documented if this difference is intentional).

Issue:

Progress bar shows loss: 17.8182, MAE: 3.4545

Returned values show loss: 38.5, MAE: 5.5

The correct values should be the returned ones (38.5 and 5.5 respectively), as these match manual calculations

The progress bar seems to be averaging already-averaged batch values

Environment:
TensorFlow 2.19

Additional notes:
This discrepancy can be confusing for users who rely on the progress bar metrics during evaluation.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions