Skip to content

implement of leftpadding #2242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 28, 2025
Merged

implement of leftpadding #2242

merged 9 commits into from
May 28, 2025

Conversation

pass-lin
Copy link
Contributor

@pass-lin pass-lin commented May 2, 2025

from #2237

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! couple comments

@pass-lin
Copy link
Contributor Author

pass-lin commented May 9, 2025

@mattdangerw
hey,plz review my new update
it has passed test from your colab

@sachinprasadhs sachinprasadhs added the kokoro:force-run Runs Tests on GPU label May 16, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label May 16, 2025
@pass-lin
Copy link
Contributor Author

pass-lin commented May 17, 2025

=========================== short test summary info ============================
FAILED keras_hub/src/models/gemma/gemma_causal_lm_test.py::GemmaCausalLMTest::test_flash_attention_call - AssertionError: Expected 'dot_product_attention' to have been called.
FAILED keras_hub/src/models/gemma3/gemma3_causal_lm_test.py::Gemma3CausalLMTest::test_text_flash_attention_call - AssertionError: Expected 'dot_product_attention' to have been called.
FAILED keras_hub/src/models/mixtral/mixtral_backbone_test.py::MixtralBackboneTest::test_backbone_basics - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_backbone_test.py::MixtralBackboneTest::test_saved_model - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_causal_lm_basics - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_early_stopping - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_generate - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_generate_compilation - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_saved_model - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_score_layer_intercept_fn_exfiltration - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_score_logits - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/mixtral/mixtral_causal_lm_test.py::MixtralCausalLMTest::test_score_loss - AttributeError: 'CachedMixtralAttention' object has no attribute 'dropout'. Did you mean: '_dropout'?
FAILED keras_hub/src/models/qwen_moe/qwen_moe_backbone_test.py::QwenMoeBackboneTest::test_auxiliary_loss - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_backbone_test.py::QwenMoeBackboneTest::test_backbone_basics - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_backbone_test.py::QwenMoeBackboneTest::test_saved_model - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_causal_lm_basics - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_early_stopping - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_flash_attention_call - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_generate - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_generate_compilation - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_generate_strip_prompt - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
FAILED keras_hub/src/models/qwen_moe/qwen_moe_causal_lm_test.py::QwenMoeCausalLMTest::test_saved_model - AttributeError: 'QwenMoeAttention' object has no attribute 'logit_soft_cap'
========== 22 failed, 1228 passed, 489 skipped in 5007.81s (1:23:27) =========

@sachinprasadhs It seems that this error has nothing to do with me. But it's worth raising a new issue to fix it.
i can fix this bug at #2257

@pass-lin
Copy link
Contributor Author

@mattdangerw Could you please check if we meet the criteria for merging now?

expected_output = [[3, 3, 1, 5, 6, 7, 2], [3, 1, 8, 9, 10, 11, 2]]
self.assertAllEqual(output, expected_output)

def test_truncation_side_flips(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need all of these. We might just need test_truncation and test_truncation_without_end_value for these next three tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need all of these. We might just need and for these next three tests.test_truncation``test_truncation_without_end_value

I did both left and right scenarios in all the tests. I think it's still necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's at least remove the "side_flips" from the name. I don't think any reader would understand what that means. test_truncation and test_truncation_without_endvalue

@@ -139,6 +142,20 @@ def check_special_value_type(value, value_name):

self.pad_value = pad_value
self.return_padding_mask = return_padding_mask
self.padding_side = padding_side

def pad(self, x, shape, pad_value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this into a util in tensor_utils.py or something like that. We will also need it for the multi segment packer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this into a util in or something like that. We will also need it for the multi segment packer.tensor_utils.py

I don't think this is necessary. Only Bert-like models are using multi-segment packers. For Bert-like models, there is no essential difference between left padding and right padding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want it first and foremost for uniformity of the API. But also, this is not just for BERT-like. Gemma3 and PaliGemma use it, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want it first and foremost for uniformity of the API. But also, this is not just for BERT-like. Gemma3 and PaliGemma use it, for example.

OK, I have fulfilled this requirement. Please check

@@ -147,3 +288,39 @@ def test_get_config(self):
}

self.assertEqual(config, {**config, **expected_config_subset})

def test_return_padding_mask_right_padding(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keeping with the other tests, just leave these in the same test with a # right padding and # left padding comment.

expected_output = [[3, 3, 1, 5, 6, 7, 2], [3, 1, 8, 9, 10, 11, 2]]
self.assertAllEqual(output, expected_output)

def test_truncation_side_flips(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's at least remove the "side_flips" from the name. I don't think any reader would understand what that means. test_truncation and test_truncation_without_endvalue

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just last couple minor comments. This looks good.

@@ -124,6 +125,7 @@ def __init__(
sep_value=None,
pad_value=None,
truncate="round_robin",
padding_side="right",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a docstring

@@ -163,6 +165,8 @@ def check_special_value_type(value, value_name):

self.pad_value = pad_value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this empty line

@pass-lin
Copy link
Contributor Author

@mattdangerw Can my implementation be merged?

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label May 28, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label May 28, 2025
@mattdangerw mattdangerw merged commit c314f88 into keras-team:master May 28, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants