Skip to content

Improve floating-point Euclidean division for Float16 and Float32 #49637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

simonbyrne
Copy link
Member

Fixes #49450.

# Vincent Lefèvre: "The Euclidean Division Implemented with a Floating-Point Division and a Floor"
# https://inria.hal.science/inria-00070403
# Theorem 1 implies that the following are exact if eps(x/y) <= 1
div(x::Float32, y::Float32, r::RoundingMode) = Float32(round(Float64(x) / Float64(y), r))
Copy link
Member

@giordano giordano May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this result in a call to llvm.rint.f64 in llvm code? I'm away from computer, can't test myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (if you use RoundNearest):

julia> ddiv(x::Float32, y::Float32, r::RoundingMode) = Float32(round(Float64(x) / Float64(y), r))
ddiv (generic function with 1 method)

julia> @code_llvm ddiv(1f0,2f0, RoundNearest)
;  @ REPL[11]:1 within `ddiv`
define float @julia_ddiv_281(float %0, float %1) #0 {
top:
; ┌ @ float.jl:236 within `Float64`
   %2 = fpext float %0 to double
   %3 = fpext float %1 to double
; └
; ┌ @ float.jl:386 within `/`
   %4 = fdiv double %2, %3
; └
; ┌ @ float.jl:370 within `round`
   %5 = call double @llvm.rint.f64(double %4)
; └
; ┌ @ float.jl:233 within `Float32`
   %6 = fptrunc double %5 to float
; └
  ret float %6
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had problems with a target, which has only half and single precision floating point numbers, not implenting llvm.rint.f64. This is a problem with our sin(::Float32) and cos(::Float32) methods, but at least I can override those methods with calls to f32 llvm intrinsics, but with a quick search in llvm langref on the phone I couldn't find an intrinsic for this operation (I hope I missed it), which would make replacing this method much more cumbersome

Copy link
Member Author

@simonbyrne simonbyrne May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a function in most languages (and isn't in C), so it probably doesn't have an LLVM intrinsic (otherwise we would have used that).

Do we have some flag that can detect whether the target has native Float64?

@nsajko
Copy link
Contributor

nsajko commented May 5, 2023

Doesn't fix #49450 if it only helps for Float16 and Float32.

@simonbyrne
Copy link
Member Author

Yes, but the edge cases for Float64 are much harder to hit. It would be good to have a fix though.

@nsajko
Copy link
Contributor

nsajko commented May 5, 2023

It would be good to have a fix though.

We do, since last week: #49561

It was still not perfect for Float16 though, so now I'm incorporating your idea of widening to Float32 into that PR.

BTW, the algorithm you propose appears to be too simple, it behaves worse than master for RoundUp, for example., although it seems perfect for RoundNearest.

@nsajko
Copy link
Contributor

nsajko commented May 5, 2023

BTW, the algorithm you propose appears to be too simple [...]

This was wrong, sorry, there was a bug in my experiment.

@aravindh-krishnamoorthy
Copy link
Contributor

aravindh-krishnamoorthy commented May 5, 2023

@simonbyrne @yurivish Sorry, I'm new here but I searched through the code base and several packages.

Who are the potential users for fld(x,y) with floating-point x and y?

I, unfortunately, do not still understand why rounddown(x/y), where x/y is computed natively in the types of x and y, is not sufficient. It is guaranteed to return an integer smaller than x/y, as promised by fld. The documentation also says the same.

Also, unfortunately, it is not clear to me why aesthetic things like fld(10.0, 0.1) equal 100.0 must be supported. One knows that 0.1 cannot be perfectly represented.

@simonbyrne
Copy link
Member Author

julia> 10.0/0.1
100.0

However the actual result is smaller

julia> big(10.0)/0.1
99.99999999999999444888487687421760603063276150361782076232622353718522343744992

julia> mod(10.0,0.1)
0.09999999999999945

So we define

julia> fld(10.0, 0.1)
99.0

nsajko added a commit to nsajko/julia that referenced this pull request May 5, 2023
Double-word arithmetics are used, except for rounding modes without
`rem`, which now get simple fallback implementations (on master their
`div` methods fail when called).

The idea to fix `Float16` by widening to `Float32` is taken from
Simon Byrne's JuliaLang#49637

A script used for assessing the correctness of `div`:
```julia
using Random

accurate_div(x, y, r::RoundingMode) =
    div(BigFloat(x), BigFloat(y), r)

function count_wrong_floats(
    div_fun::Fun,
    r::RoundingMode,
    ::Type{F},
    ::Type{U},
    n::Int,
    m::Int,
) where {Fun <: Function, F, U}
    count_wrong_huge_quo = 0
    count_wrong_friendly_quo = 0

    count_total_huge_quo = 0
    count_total_friendly_quo = 0

    vec_x = zeros(U, m)
    vec_y = zeros(U, m)

    for i ∈ 1:n
        Random.rand!(vec_x)
        Random.rand!(vec_y)

        for (x_raw, y_raw) ∈ zip(vec_x, vec_y)
            x = reinterpret(F, x_raw)
            y = reinterpret(F, y_raw)

            (!isfinite(x) | !isfinite(y)) && continue

            quo_is_huge = (maxintfloat(F) < abs(x / y))

            acc_big = accurate_div(x, y, r)

            acc = F(acc_big)

            # Skip cases when the result isn't representable, the
            # correct result is not specified for this case, and
            # it's not clear what a user would expect either.
            (acc == acc_big) || continue

            if quo_is_huge
                count_total_huge_quo += true
            else
                count_total_friendly_quo += true
            end

            d = div_fun(x, y, r)

            is_ok = (d == acc) | (isnan(d) & isnan(acc))

            if !is_ok
                if quo_is_huge
                    count_wrong_huge_quo += true
                else
                    count_wrong_friendly_quo += true
                end
            end
        end
    end

    (
        bad_ratio_huge_quotient =
            count_wrong_huge_quo / count_total_huge_quo,
        bad_ratio_friendly_quotient =
            count_wrong_friendly_quo / count_total_friendly_quo,
        total_huge_quotient_count =
            count_total_huge_quo,
        total_friendly_quotient_count =
            count_total_friendly_quo,
    )
end

const float_types = (Float16, Float32, Float64)
const bits_types = (UInt16, UInt32, UInt64)

 # not supported on master
const rounding_modes_other = (
    RoundNearestTiesAway, RoundNearestTiesUp
)

const rounding_modes = (
    RoundNearest, RoundUp, RoundDown, RoundFromZero, RoundToZero
)

function experiment(itcnt::Int, itcnt_inner::Int)
    for (F, U) ∈ zip(float_types, bits_types)
        println("$F $U")
        for rm ∈ rounding_modes
            println("  $rm")
            flush(stdout)
            res = count_wrong_floats(
                div, rm, F, U, itcnt, itcnt_inner
            )
            ch = res.total_huge_quotient_count
            cf = res.total_friendly_quotient_count
            rh = res.bad_ratio_huge_quotient
            rf = res.bad_ratio_friendly_quotient
            println("    quotients of huge magnitude:")
            println("      total count: $ch")
            println("      ratio of incorrect results: $rh")
            println("    quotients of more normal magnitude:")
            println("      total count: $cf")
            println("      ratio of incorrect results: $rf")
            flush(stdout)
        end
        println()
        flush(stdout)
    end
    nothing
end

experiment(20, 2^20)
```

Results on master:
```
Float16 UInt16
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 371923
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15566588
      ratio of incorrect results: 0.0017671823780522746
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 375420
      ratio of incorrect results: 5.327366682648766e-6
    quotients of more normal magnitude:
      total count: 15567018
      ratio of incorrect results: 0.004671543387436181
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 375158
      ratio of incorrect results: 2.665543584303147e-6
    quotients of more normal magnitude:
      total count: 15565611
      ratio of incorrect results: 0.004713146178457113
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 376802
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564500
      ratio of incorrect results: 0.005449195284140191
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 374335
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15563355
      ratio of incorrect results: 0.003891191841347833

Float32 UInt32
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 73547
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 1227091
      ratio of incorrect results: 0.0003218994905920188
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 73215
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12272614
      ratio of incorrect results: 0.0009024972186039583
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 73355
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269080
      ratio of incorrect results: 0.0009091961255448657
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 73961
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12273238
      ratio of incorrect results: 0.0009261614579624383
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 73327
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12268024
      ratio of incorrect results: 0.0008781365279363653

Float64 UInt64
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 9807
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009917
      ratio of incorrect results: 4.677601111797664e-5
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 10015
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11012903
      ratio of incorrect results: 0.00013012009639965048
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 10048
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11011455
      ratio of incorrect results: 0.00012886580383791242
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 9902
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009214
      ratio of incorrect results: 0.00012843787031481085
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 9879
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009033
      ratio of incorrect results: 0.00013797760439086704
```

Results after this change (everything is correct):
```
Float16 UInt16
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 372015
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15568185
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 375936
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564509
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 375737
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564872
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 377102
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15563927
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 374618
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15567337
      ratio of incorrect results: 0.0

Float32 UInt32
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 73567
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12267866
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 73190
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12271827
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 73491
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12266797
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 73504
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269685
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 73552
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269727
      ratio of incorrect results: 0.0

Float64 UInt64
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 9750
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11007512
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 10046
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009576
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 9939
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11014458
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 9855
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11007247
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 9798
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11006320
      ratio of incorrect results: 0.0
```

Results for methods that weren't implemented at all on master (those
that miss a `rem` method):
```
Float16 UInt16
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 373333
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564325
      ratio of incorrect results: 0.0
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 374334
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15567551
      ratio of incorrect results: 0.0

Float32 UInt32
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 73722
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12271459
      ratio of incorrect results: 0.00301610427904294
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 73861
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12273032
      ratio of incorrect results: 0.003042850373078144

Float64 UInt64
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 9981
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009598
      ratio of incorrect results: 0.00044415790658296515
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 9926
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11013978
      ratio of incorrect results: 0.00046023335074756825
```

Fixes JuliaLang#49450
nsajko added a commit to nsajko/julia that referenced this pull request May 5, 2023
Double-word arithmetics are used, except for rounding modes without
`rem`, which now get simple fallback implementations (on master their
`div` methods fail when called).

The idea to fix `Float16` by widening to `Float32` is taken from
Simon Byrne's JuliaLang#49637

A script used for assessing the correctness of `div`:
```julia
using Random

accurate_div(x, y, r::RoundingMode) =
    div(BigFloat(x), BigFloat(y), r)

function count_wrong_floats(
    div_fun::Fun,
    r::RoundingMode,
    ::Type{F},
    ::Type{U},
    n::Int,
    m::Int,
) where {Fun <: Function, F, U}
    count_wrong_huge_quo = 0
    count_wrong_friendly_quo = 0

    count_total_huge_quo = 0
    count_total_friendly_quo = 0

    vec_x = zeros(U, m)
    vec_y = zeros(U, m)

    for i ∈ 1:n
        Random.rand!(vec_x)
        Random.rand!(vec_y)

        for (x_raw, y_raw) ∈ zip(vec_x, vec_y)
            x = reinterpret(F, x_raw)
            y = reinterpret(F, y_raw)

            (!isfinite(x) | !isfinite(y)) && continue

            quo_is_huge = (maxintfloat(F) < abs(x / y))

            acc_big = accurate_div(x, y, r)

            acc = F(acc_big)

            # Skip cases when the result isn't representable, the
            # correct result is not specified for this case, and
            # it's not clear what a user would expect either.
            (acc == acc_big) || continue

            if quo_is_huge
                count_total_huge_quo += true
            else
                count_total_friendly_quo += true
            end

            d = div_fun(x, y, r)

            is_ok = (d == acc) | (isnan(d) & isnan(acc))

            if !is_ok
                if quo_is_huge
                    count_wrong_huge_quo += true
                else
                    count_wrong_friendly_quo += true
                end
            end
        end
    end

    (
        bad_ratio_huge_quotient =
            count_wrong_huge_quo / count_total_huge_quo,
        bad_ratio_friendly_quotient =
            count_wrong_friendly_quo / count_total_friendly_quo,
        total_huge_quotient_count =
            count_total_huge_quo,
        total_friendly_quotient_count =
            count_total_friendly_quo,
    )
end

const float_types = (Float16, Float32, Float64)
const bits_types = (UInt16, UInt32, UInt64)

 # not supported on master
const rounding_modes_other = (
    RoundNearestTiesAway, RoundNearestTiesUp
)

const rounding_modes = (
    RoundNearest, RoundUp, RoundDown, RoundFromZero, RoundToZero
)

function experiment(itcnt::Int, itcnt_inner::Int)
    for (F, U) ∈ zip(float_types, bits_types)
        println("$F $U")
        for rm ∈ rounding_modes
            println("  $rm")
            flush(stdout)
            res = count_wrong_floats(
                div, rm, F, U, itcnt, itcnt_inner
            )
            ch = res.total_huge_quotient_count
            cf = res.total_friendly_quotient_count
            rh = res.bad_ratio_huge_quotient
            rf = res.bad_ratio_friendly_quotient
            println("    quotients of huge magnitude:")
            println("      total count: $ch")
            println("      ratio of incorrect results: $rh")
            println("    quotients of more normal magnitude:")
            println("      total count: $cf")
            println("      ratio of incorrect results: $rf")
            flush(stdout)
        end
        println()
        flush(stdout)
    end
    nothing
end

experiment(20, 2^20)
```

Results on master:
```
Float16 UInt16
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 371923
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15566588
      ratio of incorrect results: 0.0017671823780522746
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 375420
      ratio of incorrect results: 5.327366682648766e-6
    quotients of more normal magnitude:
      total count: 15567018
      ratio of incorrect results: 0.004671543387436181
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 375158
      ratio of incorrect results: 2.665543584303147e-6
    quotients of more normal magnitude:
      total count: 15565611
      ratio of incorrect results: 0.004713146178457113
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 376802
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564500
      ratio of incorrect results: 0.005449195284140191
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 374335
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15563355
      ratio of incorrect results: 0.003891191841347833

Float32 UInt32
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 73547
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 1227091
      ratio of incorrect results: 0.0003218994905920188
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 73215
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12272614
      ratio of incorrect results: 0.0009024972186039583
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 73355
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269080
      ratio of incorrect results: 0.0009091961255448657
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 73961
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12273238
      ratio of incorrect results: 0.0009261614579624383
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 73327
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12268024
      ratio of incorrect results: 0.0008781365279363653

Float64 UInt64
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 9807
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009917
      ratio of incorrect results: 4.677601111797664e-5
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 10015
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11012903
      ratio of incorrect results: 0.00013012009639965048
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 10048
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11011455
      ratio of incorrect results: 0.00012886580383791242
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 9902
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009214
      ratio of incorrect results: 0.00012843787031481085
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 9879
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009033
      ratio of incorrect results: 0.00013797760439086704
```

Results after this change (everything is correct):
```
Float16 UInt16
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 372015
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15568185
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 375936
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564509
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 375737
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564872
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 377102
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15563927
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 374618
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15567337
      ratio of incorrect results: 0.0

Float32 UInt32
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 73567
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12267866
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 73190
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12271827
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 73491
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12266797
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 73504
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269685
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 73552
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12269727
      ratio of incorrect results: 0.0

Float64 UInt64
  RoundingMode{:Nearest}()
    quotients of huge magnitude:
      total count: 9750
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11007512
      ratio of incorrect results: 0.0
  RoundingMode{:Up}()
    quotients of huge magnitude:
      total count: 10046
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009576
      ratio of incorrect results: 0.0
  RoundingMode{:Down}()
    quotients of huge magnitude:
      total count: 9939
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11014458
      ratio of incorrect results: 0.0
  RoundingMode{:FromZero}()
    quotients of huge magnitude:
      total count: 9855
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11007247
      ratio of incorrect results: 0.0
  RoundingMode{:ToZero}()
    quotients of huge magnitude:
      total count: 9798
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11006320
      ratio of incorrect results: 0.0
```

Results for methods that weren't implemented at all on master (those
that miss a `rem` method):
```
Float16 UInt16
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 373333
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15564325
      ratio of incorrect results: 0.0
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 374334
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 15567551
      ratio of incorrect results: 0.0

Float32 UInt32
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 73722
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12271459
      ratio of incorrect results: 0.00301610427904294
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 73861
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 12273032
      ratio of incorrect results: 0.003042850373078144

Float64 UInt64
  RoundingMode{:NearestTiesAway}()
    quotients of huge magnitude:
      total count: 9981
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11009598
      ratio of incorrect results: 0.00044415790658296515
  RoundingMode{:NearestTiesUp}()
    quotients of huge magnitude:
      total count: 9926
      ratio of incorrect results: 0.0
    quotients of more normal magnitude:
      total count: 11013978
      ratio of incorrect results: 0.00046023335074756825
```

Fixes JuliaLang#49450
@aravindh-krishnamoorthy
Copy link
Contributor

julia> 10.0/0.1
100.0

However the actual result is smaller

julia> big(10.0)/0.1
99.99999999999999444888487687421760603063276150361782076232622353718522343744992

julia> mod(10.0,0.1)
0.09999999999999945

So we define

julia> fld(10.0, 0.1)
99.0

Ok, I just saw that this PR is about div. I was just confused. My issue was with the fld PR. I've no issues with div, sorry. Am getting old :)

@brenhinkeller brenhinkeller added maths Mathematical functions float16 bugfix This change fixes an existing bug labels Aug 6, 2023
@oscardssmith
Copy link
Member

sorry for letting this sit for so long.

@nsajko
Copy link
Contributor

nsajko commented Feb 7, 2024

This is just a partial fix, it doesn't do anything for Float64. But there's my competing PR which is more thorough.

If this is merged it should not close the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug float16 maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect results from floating-point division (fld, cld, div)
6 participants