-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Flush-To-Zero (FTZ) where output subnormals are flushed to zero, and Denormal-As-Zero (DAZ) where input subnormals are zeroed to various IEEE functions are worthy of serious consideration for support. Ultimately not just here but in the sibling testfloat-3 project.
These are not defined in IEEE, but especially in the AI era, where performance is considered and the expensive of long-paths or hardware area for subnormal support is commonly saved, the practical relevance cannot go unnoticed. For float32 and beyond (float64, float80, float128), it is extremely common to drop such support. For float16 or micro-scaling, subnormal support can generally be expected.
Also recommending the micro-scaling standard formats be implemented namely here:
https://www.opencompute.org/documents/[ocp-microscaling-formats-mx-v1-0-spec-final-pdf](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf)
This will give the project new relevance based on the direction and changes in industry. IEEE is important but the FTZ and DAZ are de-facto standards at this point, and the OCP MX spec is also considered an industry standard now.