Skip to content

Commit 918234c

Browse files
committed
Remove flush to zero from bf16
After closely analyzing Google Brain codebases, we decided that flushing to zero was the wrong thing to do. Intel and AMD probably designed their microprocessors to always flush to zero for the wrong reasons. It should have been made conditional on FTZ being set in MXCSR like other opcodes. See ggml-org/llama.cpp#7843
1 parent 4e6455e commit 918234c

File tree

1 file changed

+0
-8
lines changed

1 file changed

+0
-8
lines changed

bf16.h

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,6 @@ static inline float from_brain(uint16_t h) {
5252

5353
/**
5454
* Converts float32 to brain16.
55-
*
56-
* This function is binary identical to AMD Zen4 VCVTNEPS2BF16.
57-
* Subnormals shall be flushed to zero, and NANs will be quiet.
58-
* This code should vectorize nicely if using modern compilers.
5955
*/
6056
static inline uint16_t to_brain(float s) {
6157
uint16_t h;
@@ -68,10 +64,6 @@ static inline uint16_t to_brain(float s) {
6864
h = (u.i >> 16) | 64; /* force to quiet */
6965
return h;
7066
}
71-
if (!(u.i & 0x7f800000)) { /* subnormal */
72-
h = (u.i & 0x80000000) >> 16; /* flush to zero */
73-
return h;
74-
}
7567
return (u.i + (0x7fff + ((u.i >> 16) & 1))) >> 16;
7668
}
7769

0 commit comments

Comments
 (0)