Skip to content

Sharded weights type error #2296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

laxmareddyp
Copy link
Collaborator

@laxmareddyp laxmareddyp commented Jun 11, 2025

Description of the change

Fix: Handle lists in weight_map for sharded weights

The _get_sharded_filenames method in preset_utils.py was raising a
TypeError when weight_map.values() contained lists. This occurred because
lists are unhashable and cannot be added directly to a set.

Reference

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

@laxmareddyp laxmareddyp changed the title fix-sharded-weights-typeerror fix sharded weights typeerror Jun 11, 2025
@laxmareddyp laxmareddyp changed the title fix sharded weights typeerror Sharded weights type error Jun 11, 2025
@sachinprasadhs sachinprasadhs added the kokoro:force-run Runs Tests on GPU label Jun 11, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 11, 2025
@james77777778
Copy link
Collaborator

@laxmareddyp could we add a test for this to prevent future breakage?

Here is an example of testing sharded weights:
https://github.com/keras-team/keras-hub/blob/master/keras_hub/src/utils/preset_utils_test.py#L22

@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Jun 13, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 13, 2025
Comment on lines 67 to 84
init_kwargs = {
"vocabulary_size": 1024,
"num_layers": 12,
"num_query_heads": 8,
"num_key_value_heads": 4,
"hidden_dim": 32,
"intermediate_dim": 64,
"head_dim": 4,
"sliding_window_size": 5,
"attention_logit_soft_cap": 50,
"final_logit_soft_cap": 30,
"layer_norm_epsilon": 1e-6,
"query_head_dim_normalize": False,
"use_post_ffw_norm": True,
"use_post_attention_norm": True,
"use_sliding_window_attention": True,
}
backbone = GemmaBackbone(**init_kwargs) # ~422KB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to setUp and use it for all 3 test cases which we are doing here, so that our test setup will be lot more cleaner.

…l three relevant test cases to use the shared setup.
@laxmareddyp laxmareddyp added the kokoro:force-run Runs Tests on GPU label Jun 13, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Jun 13, 2025
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants