-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
redo sinpi
and cospi
kernel polynomial approximation
#59031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
redo sinpi
and cospi
kernel polynomial approximation
#59031
Conversation
My laptop is currently running a search for max ULP errors which will last several hours. |
@nsajko at least for Float16, Float32, it shouldn't take nearly that long. (especially if you use the x axis symmetry)
|
Also, IMO doing doublefloat Float16 almost certainly isn't worth it. No hardware that I'm aware of has slow enough Float32 performance to make that worthwhile. |
|
@vchuravy even there, are there systems where FP16 throughput is >4x FP32? I can see the argument for minimizing FP32 usage in FP16 impls, but going to FP16x2 seems unlikely to be better. |
My code is slower because I'm using |
The accuracy for |
A single test fails:
It tests the accuracy of Lines 1129 to 1136 in c1c091e
At It could be argued the test is overly strict, however I'll go and reimplement EDIT: PR #59087 |
I believe that's the case already for TPUs? |
Regarding |
* Make the `Float16` and `Float32` variants use the same evaluation scheme already used for `Float64`, instead of computing in a wider arithmetic and then rounding back to the narrower arithmetic. This of course results in some loss of accuracy, however it seems more appropriate not to rely on other arithmetics, I guess. Even though the accuracy is worse, the result still seems experimentally to be faithful for each argument from the domain of the kernel in each case except for `sinpi(::Float16)` which has max error of about 1.4 ULP. * Regenerate the polynomials for `Float64`, too, improving accuracy. The degrees of the polynomials stay the same. Closes JuliaLang#58977
A future PR or a future commit in this PR could also regenerate the polynomials for the wide kernels for `Float16` and `Float32`.
Get the ULP error to below one.
The old `sinpi` value had an error of 0.56462383 ULP, now it's just 0.4353762 ULP. Given that the result is more accurate now, I just updated the doc test.
Relax the doc test, don't check exact floating-point equality.
a8f1974
to
4af0f03
Compare
Make the
Float32
variants use the same evaluation scheme already used forFloat64
, instead of computing in a wider arithmetic and then rounding back to the narrower arithmetic. This of course results in some loss of accuracy, however it seems more appropriate not to rely on other arithmetics, I guess. Even though the accuracy is worse, the result still seems experimentally to be faithful for each argument from the domain of the kernel, with the ULP error is below1
.Regenerate the polynomials for
Float64
, too, improving accuracy. The degrees of the polynomials stay the same.Max errors over the domain (
0
to0.25
) of the kernels: