-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-51249][SS] Fixing the NoPrefixKeyStateEncoder and Avro encoding to use the correct number of version bytes #49996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d8bb16
to
7766778
Compare
7766778
to
8e5e6ed
Compare
...core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
Outdated
Show resolved
Hide resolved
...core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
Show resolved
Hide resolved
...e/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
Show resolved
Hide resolved
@HeartSaVioR PTAL when you get a chance! |
@ericm-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
To be 100% sure that we hadn't released two bugs we discovered, could you please find offending commits for both cases and update the PR description?
I see the PR description is updated. Thanks! Merging to master/4.0. |
…g to use the correct number of version bytes ### What changes were proposed in this pull request? There are currently two bugs: - The NoPrefixKeyStateEncoder adds an extra version byte to each row when UnsafeRow encoding is used: #47107 - Rows written with Avro encoding do not include a version byte: #48401 **Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.** This change fixes both of these bugs. ### Why are the changes needed? These changes are needed in order to conform with the expected state row encoding format. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #49996 from ericm-db/SPARK-51249. Lead-authored-by: Eric Marnadi <eric.marnadi@databricks.com> Co-authored-by: Eric Marnadi <132308037+ericm-db@users.noreply.github.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> (cherry picked from commit 42ab97a) Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
…g to use the correct number of version bytes ### What changes were proposed in this pull request? There are currently two bugs: - The NoPrefixKeyStateEncoder adds an extra version byte to each row when UnsafeRow encoding is used: apache#47107 - Rows written with Avro encoding do not include a version byte: apache#48401 **Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.** This change fixes both of these bugs. ### Why are the changes needed? These changes are needed in order to conform with the expected state row encoding format. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49996 from ericm-db/SPARK-51249. Lead-authored-by: Eric Marnadi <eric.marnadi@databricks.com> Co-authored-by: Eric Marnadi <132308037+ericm-db@users.noreply.github.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
There are currently two bugs:
Neither of these bugs have been released, since these bugs are only triggered with multiple column families, and transformWithState is only using it, which is going to be released for Spark 4.0.0.
This change fixes both of these bugs.
Why are the changes needed?
These changes are needed in order to conform with the expected state row encoding format.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No