redo `sinpi` and `cospi` kernel polynomial approximation #59031

nsajko · 2025-07-18T12:44:59Z

Make the Float32 variants use the same evaluation scheme already used for Float64, instead of computing in a wider arithmetic and then rounding back to the narrower arithmetic. This of course results in some loss of accuracy, however it seems more appropriate not to rely on other arithmetics, I guess. Even though the accuracy is worse, the result still seems experimentally to be faithful for each argument from the domain of the kernel, with the ULP error is below 1.
Regenerate the polynomials for Float64, too, improving accuracy. The degrees of the polynomials stay the same.

Max errors over the domain (0 to 0.25) of the kernels:

before
  func = cospi
    type = Float32
      (a, b) = (0.0, 0.25)
      max = 0.50000286f0
    type = Float64
      (a, b) = (0.0, 0.25)
      max = 0.90289307f0
  func = sinpi
    type = Float32
      (a, b) = (0.0, 0.25)
      max = 0.5000706f0
    type = Float64
      (a, b) = (0.0, 0.25)
      max = 0.6906576f0

after
  func = cospi
    type = Float32
      (a, b) = (0.0, 0.25)
      max = 0.8907261f0
    type = Float64
      (a, b) = (0.0, 0.25)
      max = 0.9042959f0
  func = sinpi
    type = Float32
      (a, b) = (0.0, 0.25)
      max = 0.9644079f0
    type = Float64
      (a, b) = (0.0, 0.25)
      max = 0.6577196f0

nsajko · 2025-07-18T12:45:45Z

My laptop is currently running a search for max ULP errors which will last several hours.

oscardssmith · 2025-07-18T13:03:00Z

@nsajko at least for Float16, Float32, it shouldn't take nearly that long. (especially if you use the x axis symmetry)

julia> function testf(f::F, T) where F
           maxulp = widen(zero(T))
           argmaxulp = T(NaN)
           for i in reinterpret(Unsigned, floatmin(T)):reinterpret(Unsigned, floatmax(T))
               x = reinterpret(T, i)
               fx = f(x)
               fxbig = f(widen(x))
               ulp = T(abs(fxbig-fx)) / eps(max(abs(fx), floatmin(T)))
               if ulp > maxulp
                   maxulp = ulp
                   argmaxulp=x
               end
           end
           return (argmaxulp, maxulp)
       end
testf (generic function with 1 method)

julia> @time testf(cospi, Float32)
 34.722836 seconds (3 allocations: 48 bytes)
(0.42245033f0, 0.50007045f0)

oscardssmith · 2025-07-18T13:18:48Z

Also, IMO doing doublefloat Float16 almost certainly isn't worth it. No hardware that I'm aware of has slow enough Float32 performance to make that worthwhile.

vchuravy · 2025-07-18T16:17:06Z

Also, IMO doing doublefloat Float16 almost certainly isn't worth it. No hardware that I'm aware of has slow enough Float32 performance to make that worthwhile.

future GPU/TPUs

oscardssmith · 2025-07-18T18:16:19Z

@vchuravy even there, are there systems where FP16 throughput is >4x FP32? I can see the argument for minimizing FP32 usage in FP16 impls, but going to FP16x2 seems unlikely to be better.

nsajko · 2025-07-20T15:51:24Z

for Float16, Float32, it shouldn't take nearly that long

My code is slower because I'm using BigFloat to be extra sure in the results.

nsajko · 2025-07-20T15:56:20Z

The accuracy for cospi(::Float64) seems like it may actually have regressed due to the coefficients being regenerated with Sollya, from 0.90289307f0 ULP to 0.9042959f0 ULP. That said, I suppose it's just a measurement error due to not being able to search through the Float64 space exhaustively.

nsajko · 2025-07-20T20:38:55Z

A single test fails:

Error in testset math:
Test Failed at /cache/build/tester-amdci5-9/julialang/julia-master/julia-c968bd880c/share/julia/test/math.jl:634
  Expression: ≈(cosc(0.14), -0.4517331883801308, rtol = 1.0e-15)
   Evaluated: -0.4517331883801315 ≈ -0.4517331883801308 (rtol=1.0e-15)
ERROR: LoadError: Test run finished with errors
in expression starting at /cache/build/tester-amdci5-9/julialang/julia-master/julia-c968bd880c/share/julia/test/runtests.jl:99

It tests the accuracy of cosc(0.14::Float64). The relevant cosc method is currently implemented in terms of cospi and sinpi:

julia/base/special/trig.jl

Lines 1129 to 1136 in c1c091e

    
           # hard-code Float64/Float32 Taylor series, with coefficients 
        
           #  Float64.([(-1)^n*big(pi)^(2n)/((2n+1)*factorial(2n-1)) for n = 1:6]) 
        
           _cosc(x::Union{Float64,ComplexF64}) = 
        
               fastabs(x) < 0.14 ? x*evalpoly(x^2, (-3.289868133696453, 3.2469697011334144, -1.1445109447325053, 0.2091827825412384, -0.023460810354558236, 0.001781145516372852)) : 
        
               isinf_real(x) ? zero(x) : ((pi*x)*cospi(x)-sinpi(x))/((pi*x)*x) 
        
           _cosc(x::Union{Float32,ComplexF32}) = 
        
               fastabs(x) < 0.26f0 ? x*evalpoly(x^2, (-3.289868f0, 3.2469697f0, -1.144511f0, 0.20918278f0)) : 
        
               isinf_real(x) ? zero(x) : ((pi*x)*cospi(x)-sinpi(x))/((pi*x)*x)

At x = 0.14 this PR indeed returns a slightly less accurate result than master for sinpi: the error goes from 0.49043456f0 ULP to 0.5095654f0 ULP.

It could be argued the test is overly strict, however I'll go and reimplement cosc for Float32 and Float64 instead of relaxing it.

EDIT: PR #59087

giordano · 2025-07-20T22:18:28Z

even there, are there systems where FP16 throughput is >4x FP32?

I believe that's the case already for TPUs?

nsajko · 2025-07-24T18:45:03Z

Regarding Float16, I basically gave up on that because getting the error to below 1 ULP without reaching out to a wider type seems challenging.

* Make the `Float16` and `Float32` variants use the same evaluation scheme already used for `Float64`, instead of computing in a wider arithmetic and then rounding back to the narrower arithmetic. This of course results in some loss of accuracy, however it seems more appropriate not to rely on other arithmetics, I guess. Even though the accuracy is worse, the result still seems experimentally to be faithful for each argument from the domain of the kernel in each case except for `sinpi(::Float16)` which has max error of about 1.4 ULP. * Regenerate the polynomials for `Float64`, too, improving accuracy. The degrees of the polynomials stay the same. Closes JuliaLang#58977

A future PR or a future commit in this PR could also regenerate the polynomials for the wide kernels for `Float16` and `Float32`.

Get the ULP error to below one.

The old `sinpi` value had an error of 0.56462383 ULP, now it's just 0.4353762 ULP. Given that the result is more accurate now, I just updated the doc test.

Relax the doc test, don't check exact floating-point equality.

nsajko added the maths Mathematical functions label Jul 18, 2025

nsajko mentioned this pull request Jul 20, 2025

special: trig: smaller polynomials for sinpi,cospi kernels for f16,f32 #58977

Draft

nsajko added 6 commits July 24, 2025 21:21

fix multiline comment syntax

c43e6b9

revert changes related to Float16, as per oscardssmith

a558097

A future PR or a future commit in this PR could also regenerate the polynomials for the wide kernels for `Float16` and `Float32`.

increase by one the degree of the polynomial for sinpi, Float32

4af0f03

Get the ULP error to below one.

update doc test, the value is now more accurate

6f3ea3a

The old `sinpi` value had an error of 0.56462383 ULP, now it's just 0.4353762 ULP. Given that the result is more accurate now, I just updated the doc test.

relax cispi doc test

6ae93c1

Relax the doc test, don't check exact floating-point equality.

nsajko force-pushed the sinpi_cospi_kernel_polynomial_approximations branch from a8f1974 to 4af0f03 Compare July 24, 2025 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

redo `sinpi` and `cospi` kernel polynomial approximation #59031

redo `sinpi` and `cospi` kernel polynomial approximation #59031

nsajko commented Jul 18, 2025 •

edited

Loading

Uh oh!

nsajko commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025

Uh oh!

vchuravy commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025 •

edited

Loading

Uh oh!

nsajko commented Jul 20, 2025

Uh oh!

nsajko commented Jul 20, 2025

Uh oh!

nsajko commented Jul 20, 2025 •

edited

Loading

Uh oh!

giordano commented Jul 20, 2025

Uh oh!

nsajko commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

redo sinpi and cospi kernel polynomial approximation #59031

Are you sure you want to change the base?

redo sinpi and cospi kernel polynomial approximation #59031

Conversation

nsajko commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nsajko commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025

Uh oh!

vchuravy commented Jul 18, 2025

Uh oh!

oscardssmith commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nsajko commented Jul 20, 2025

Uh oh!

nsajko commented Jul 20, 2025

Uh oh!

nsajko commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giordano commented Jul 20, 2025

Uh oh!

nsajko commented Jul 24, 2025

Uh oh!

Uh oh!

redo `sinpi` and `cospi` kernel polynomial approximation #59031

redo `sinpi` and `cospi` kernel polynomial approximation #59031

nsajko commented Jul 18, 2025 •

edited

Loading

oscardssmith commented Jul 18, 2025 •

edited

Loading

nsajko commented Jul 20, 2025 •

edited

Loading