Skip to content

fix: Make cast from float/double to decimal compatible with Spark #1915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

leung-ming
Copy link
Contributor

@leung-ming leung-ming commented Jun 19, 2025

Which issue does this PR close?

Closes #1371

Built on #1914

Rationale for this change

floating-point errors are magnified when casting floating point number to decimal.

println!("{}", 0.5153125_f64); // 0.5153125
println!("{:.42}", 0.5153125_f64); // 0.515312499999999951150186916493112221360207
println!("{}", 0.5153125_f64 * 1000000_f64); // 515312.49999999994
println!("{}", (0.5153125_f64 * 1000000_f64).round()); // 515312
println!("{:.42}", 515312.5_f64); // 515312.500000000000000000000000000000000000000000
println!("{}", 515312.5_f64.round()); // 515313

openJDK is using an algorithm called Schubfach.
More information about those algorithms could be found in Drachennest.

What changes are included in this PR?

todo

How are these changes tested?

org.apache.comet.CometCastSuite
org.apache.comet.exec.CometAggregateSuite

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.95%. Comparing base (f09f8af) to head (4dfea01).
Report is 282 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1915      +/-   ##
============================================
+ Coverage     56.12%   58.95%   +2.83%     
- Complexity      976     1141     +165     
============================================
  Files           119      130      +11     
  Lines         11743    12863    +1120     
  Branches       2251     2420     +169     
============================================
+ Hits           6591     7584     +993     
- Misses         4012     4059      +47     
- Partials       1140     1220      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra
Copy link
Contributor

FWIW there is a - Rust Schubfach create

@andygrove
Copy link
Member

Thanks for finding the dragonbox crate @leung-ming. Is there a reason we must add the code to Comet rather than use it as a dependency?

@leung-ming
Copy link
Contributor Author

@parthchandra @andygrove
I checked ryu, schubfach and dragonbox, they don't exposes the floating point to decimal interfaces, only floating point to string interfaces there.
Some one encounter the same problem and made ryu_floating_decimal, but I don't know it is still be maintained now.

@parthchandra
Copy link
Contributor

@leung-ming do you think the dragonbox module you've implemented can be contributed back to dragonbox? I would feel more confident if someone with a better fundamental understanding of the algorithm reviewed your work.
@andygrove the dragonbox crate has a Apache2-LLVM/Boost license. I think but am not sure, there should be no issue of compatibility with the Apache license?

@leung-ming
Copy link
Contributor Author

@leung-ming do you think the dragonbox module you've implemented can be contributed back to dragonbox? I would feel more confident if someone with a better fundamental understanding of the algorithm reviewed your work.

I am not implemented a new dragonbox, I just copy it, add 4 pub to expose the decimal interface.

@leung-ming
Copy link
Contributor Author

leung-ming commented Jun 26, 2025

@leung-ming do you think the dragonbox module you've implemented can be contributed back to dragonbox? I would feel more confident if someone with a better fundamental understanding of the algorithm reviewed your work.

How about wrapping the c++ reference implementation(decimal interface is public) from the author as a crate? Just like native/hdfs, maybe more reliable and easy to maintain?

@parthchandra
Copy link
Contributor

I am not implemented a new dragonbox, I just copy it, add 4 pub to expose the decimal interface.

Could this be done in the original crate? I understand that the original point of making the methods private is probably because the algorithm is intended specifically for int/float to string (not decimal) and also because the rust implementation appears to be for the purpose of testing. But if the reference implementation exposes the decimal interface then perhaps the original author might be okay with making the rust methods public too.

@leung-ming
Copy link
Contributor Author

I am not implemented a new dragonbox, I just copy it, add 4 pub to expose the decimal interface.

Could this be done in the original crate? I understand that the original point of making the methods private is probably because the algorithm is intended specifically for int/float to string (not decimal) and also because the rust implementation appears to be for the purpose of testing. But if the reference implementation exposes the decimal interface then perhaps the original author might be okay with making the rust methods public too.

I try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make cast from float/double to decimal compatible with Spark
4 participants