Skip to content

Conversation

ericm-db
Copy link
Contributor

@ericm-db ericm-db commented Feb 18, 2025

What changes were proposed in this pull request?

There are currently two bugs:

Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.

This change fixes both of these bugs.

Why are the changes needed?

These changes are needed in order to conform with the expected state row encoding format.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@ericm-db ericm-db changed the title [WIP] Fixing NoPrefixKeyStateEncoder [SPARK-51249] Fixing the NoPrefixKeyStateEncoder and Avro encoding to use the correct number of version bytes Feb 19, 2025
@ericm-db ericm-db changed the title [SPARK-51249] Fixing the NoPrefixKeyStateEncoder and Avro encoding to use the correct number of version bytes [SPARK-51249][SS] Fixing the NoPrefixKeyStateEncoder and Avro encoding to use the correct number of version bytes Feb 19, 2025
@ericm-db
Copy link
Contributor Author

@HeartSaVioR PTAL when you get a chance!

@HeartSaVioR
Copy link
Contributor

@ericm-db
Test failure looks relevant. Could you please take a look?

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

To be 100% sure that we hadn't released two bugs we discovered, could you please find offending commits for both cases and update the PR description?

@HeartSaVioR
Copy link
Contributor

I see the PR description is updated. Thanks! Merging to master/4.0.

HeartSaVioR pushed a commit that referenced this pull request Feb 21, 2025
…g to use the correct number of version bytes

### What changes were proposed in this pull request?

There are currently two bugs:
- The NoPrefixKeyStateEncoder adds an extra version byte to each row when UnsafeRow encoding is used: #47107
- Rows written with Avro encoding do not include a version byte: #48401

**Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.**

This change fixes both of these bugs.

### Why are the changes needed?

These changes are needed in order to conform with the expected state row encoding format.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49996 from ericm-db/SPARK-51249.

Lead-authored-by: Eric Marnadi <eric.marnadi@databricks.com>
Co-authored-by: Eric Marnadi <132308037+ericm-db@users.noreply.github.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 42ab97a)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
…g to use the correct number of version bytes

### What changes were proposed in this pull request?

There are currently two bugs:
- The NoPrefixKeyStateEncoder adds an extra version byte to each row when UnsafeRow encoding is used: apache#47107
- Rows written with Avro encoding do not include a version byte: apache#48401

**Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.**

This change fixes both of these bugs.

### Why are the changes needed?

These changes are needed in order to conform with the expected state row encoding format.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#49996 from ericm-db/SPARK-51249.

Lead-authored-by: Eric Marnadi <eric.marnadi@databricks.com>
Co-authored-by: Eric Marnadi <132308037+ericm-db@users.noreply.github.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants