Skip to content

Slightly inaccurate emulated fma on Float16 #57784

@giordano

Description

@giordano

Looking at llvm/llvm-project#128450, I realised that our emulated Float16 FMA is inaccurate as well:

julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
           @eval @show $(f)($(T)(0x1.400p+8), $(T)(0x1.008p+7), $(T)(0x1.000p-24))
       end
(fma)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.102e4)
(muladd)((Float16)(320.0), (Float16)(128.25), (Float16)(5.960464477539063e-8)) = Float16(4.106e4)
(fma)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(muladd)((Float32)(320.0), (Float32)(128.25), (Float32)(5.960464477539063e-8)) = 41040.0f0
(fma)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605
(muladd)((Float64)(320.0), (Float64)(128.25), (Float64)(5.960464477539063e-8)) = 41040.000000059605

julia> for T in (Float16, Float32, Float64), f in (fma, muladd)
           @eval @show $(f)($(T)(0x1.eb8p-12), $(T)(0x1.9p-11), $(T)(-0x1p-11))
       end
(fma)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.0004878)
(muladd)((Float16)(0.0004687309265136719), (Float16)(0.000762939453125), (Float16)(-0.00048828125)) = Float16(-0.000488)
(fma)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(muladd)((Float32)(0.0004687309265136719), (Float32)(0.000762939453125), (Float32)(-0.00048828125)) = -0.00048792362f0
(fma)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629
(muladd)((Float64)(0.0004687309265136719), (Float64)(0.000762939453125), (Float64)(-0.00048828125)) = -0.0004879236366832629

julia> versioninfo()
Julia Version 1.13.0-DEV.204
Commit b9ac28a645* (2025-03-12 09:49 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin23.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-19.1.7 (ORCJIT, apple-m1)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

The result of fma is 1ULP off.

Note that on this CPU, with native support for fp16 extension, muladd gives the "right" result, unlike fma (which is using the emulated fma implementation because of #57783).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions