Skip to content

Improve floating-point Euclidean division for Float16 and Float32 #49637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions base/div.jl
Original file line number Diff line number Diff line change
Expand Up @@ -370,3 +370,9 @@ end
# NOTE: C89 fmod() and x87 FPREM implicitly provide truncating float division,
# so it is used here as the basis of float div().
div(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} = convert(T, round((x - rem(x, y, r)) / y))

# Vincent Lefèvre: "The Euclidean Division Implemented with a Floating-Point Division and a Floor"
# https://inria.hal.science/inria-00070403
# Theorem 1 implies that the following are exact if eps(x/y) <= 1
div(x::Float32, y::Float32, r::RoundingMode) = Float32(round(Float64(x) / Float64(y), r))
Copy link
Member

@giordano giordano May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in a call to llvm.rint.f64 in llvm code? I'm away from computer, can't test myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (if you use RoundNearest):

julia> ddiv(x::Float32, y::Float32, r::RoundingMode) = Float32(round(Float64(x) / Float64(y), r))
ddiv (generic function with 1 method)

julia> @code_llvm ddiv(1f0,2f0, RoundNearest)
;  @ REPL[11]:1 within `ddiv`
define float @julia_ddiv_281(float %0, float %1) #0 {
top:
; ┌ @ float.jl:236 within `Float64`
   %2 = fpext float %0 to double
   %3 = fpext float %1 to double
; └
; ┌ @ float.jl:386 within `/`
   %4 = fdiv double %2, %3
; └
; ┌ @ float.jl:370 within `round`
   %5 = call double @llvm.rint.f64(double %4)
; └
; ┌ @ float.jl:233 within `Float32`
   %6 = fptrunc double %5 to float
; └
  ret float %6
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had problems with a target, which has only half and single precision floating point numbers, not implenting llvm.rint.f64. This is a problem with our sin(::Float32) and cos(::Float32) methods, but at least I can override those methods with calls to f32 llvm intrinsics, but with a quick search in llvm langref on the phone I couldn't find an intrinsic for this operation (I hope I missed it), which would make replacing this method much more cumbersome

Copy link
Member Author

@simonbyrne simonbyrne May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a function in most languages (and isn't in C), so it probably doesn't have an LLVM intrinsic (otherwise we would have used that).

Do we have some flag that can detect whether the target has native Float64?

div(x::Float16, y::Float16, r::RoundingMode) = Float16(round(Float32(x) / Float32(y), r))
21 changes: 21 additions & 0 deletions test/numbers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1713,6 +1713,27 @@ end
@test cld(-1.1, 0.1) == div(-1.1, 0.1, RoundUp) == ceil(big(-1.1)/big(0.1)) == -11.0
@test fld(-1.1, 0.1) == div(-1.1, 0.1, RoundDown) == floor(big(-1.1)/big(0.1)) == -12.0
end
@testset "issue #49450" begin
@test div(514, Float16(0.75)) === Float16(685)
@test fld(514, Float16(0.75)) === Float16(685)
@test cld(515, Float16(0.75)) === Float16(687)

@test cld(1, Float16(0.000999)) === Float16(1001)
@test cld(2, Float16(0.001999)) === Float16(1001)
@test cld(3, Float16(0.002934)) === Float16(1023)
@test cld(4, Float16(0.003998)) === Float16(1001)
@test fld(5, Float16(0.004925)) === Float16(1015)

@test div(4_194_307, Float32(0.75)) === Float32(5_592_409)
@test fld(4_194_307, Float32(0.75)) === Float32(5_592_409)
@test cld(4_194_308, Float32(0.75)) === Float32(5_592_411)

@test fld(5, Float32(6.556511e-7)) === Float32(7_626_007)
@test fld(10, Float32(1.3113022e-6)) === Float32(7_626_007)
@test fld(11, Float32(1.4305115e-6)) === Float32(7_689_557)
@test cld(16, Float32(2.8014183e-6)) === Float32(5_711_393)
@test cld(17, Float32(2.2053719e-6)) === Float32(7_708_451)
end
end
@testset "return types" begin
for T in (Int8,Int16,Int32,Int64,Int128, UInt8,UInt16,UInt32,UInt64,UInt128)
Expand Down