-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
Description
Looking at llvm/llvm-project#128450, I realised that our emulated Float16 FMA is inaccurate as well:
julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
@eval @show $(f)($(T)(0x1.400p+8), $(T)(0x1.008p+7), $(T)(0x1.000p-24))
end
(fma)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.102e4)
(muladd)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.106e4)
(fma)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(muladd)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(fma)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605
(muladd)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605
julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
@eval @show $(f)($(T)(0x1.eb8p-12), $(T)(0x1.9p-11), $(T)(-0x1p-11))
end
(fma)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.0004878)
(muladd)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.000488)
(fma)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(muladd)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(fma)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629
(muladd)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629
julia> versioninfo()
Julia Version 1.13.0-DEV.204
Commit b9ac28a645* (2025-03-12 09:49 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.4.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LLVM: libLLVM-19.1.7 (ORCJIT, apple-m1)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)
The result of fma
is 1ULP off.
Note that on this CPU, with native support for fp16 extension, muladd
gives the "right" result, unlike fma
(which is using the emulated fma implementation because of #57783).