Skip to content

keras.layers.Normalization#adapt() does not support keras.utils.PyDataset as input. #21300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
limzikiki opened this issue May 18, 2025 · 3 comments
Assignees
Labels
stat:contributions welcome A pull request to fix this issue would be welcome. type:Bug

Comments

@limzikiki
Copy link

Expected behavior

When I create a custom PyDataset and then pass it into the Normalization adapt method, the mean and variance should be computed successfully, as for a TensorFlow dataset.

Actual behavior

Method adapt does not handle the PyDataset like inputs.

Traceback (most recent call last):
  File "/home/user/project/keras_bug_sample.py", line 14, in <module>
    normalizer.adapt(CustomDataset)
  File "/home/user/.pyenv/versions/3.12.8/lib/python3.12/site-packages/keras/src/layers/preprocessing/normalization.py", line 230, in adapt
    self.build(input_shape)
               ^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'input_shape' where it is not associated with a value

Steps to reproduce

import keras

class CustomDataset(keras.utils.PyDataset):
    def __len__(self):
        return 100

    def __getitem__(self, idx):
        # Generate dummy data
        x = np.random.rand(32, 32, 3)
        y = np.random.randint(0, 10, size=(1,))
        return x, y

normalizer = keras.layers.Normalization()
normalizer.adapt(CustomDataset)
@sonali-kumari1
Copy link
Contributor

Hi @limzikiki -
Thanks for reporting this. The error UnboundLocalError: cannot access local variable 'input_shape' where it is not associated with a value occurs because adapt() does not directly support custom datasets like keras.utils.PyDataset. The adapt() method expects data to be in the form of tf.data.Dataset, NumPy array, or a backend-native eager tensor. Since you are passing keras.utils.PyDataset, adapt() is not able to extract the input_shape which leads to the UnboundLocalError. To resolve this you can convert your custom dataset into tf.data.Dataset and ensure that dataset is properly batched to compute the mean and variance correctly.

@limzikiki
Copy link
Author

I ended up fixing it by subclassing the Normalization class and introducing support for the PyDataset type inputs. I was wondering if the Keras community would also benefit from this feature. Basically, what I did in my solution is that I fetched the first batch, and then, based on its shape, I built the normalisation. I don't know if that's a proper solution. If that’s okay, I can open a PR with the changes.

@VarunS1997
Copy link
Collaborator

Please go ahead and make a PR, @limzikiki -- thank you!

@VarunS1997 VarunS1997 added the stat:contributions welcome A pull request to fix this issue would be welcome. label May 22, 2025
@sachinprasadhs sachinprasadhs removed the keras-team-review-pending Pending review by a Keras team member. label May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contributions welcome A pull request to fix this issue would be welcome. type:Bug
Projects
None yet
Development

No branches or pull requests

6 participants