Skip to content

Commit 3c0fcea

Browse files
authored
Merge pull request #40 from graphcore-research/doc
Doc fixes
2 parents 564c104 + 27bbc14 commit 3c0fcea

File tree

4 files changed

+114
-95
lines changed

4 files changed

+114
-95
lines changed

docs/source/05-stochastic-rounding.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -464,7 +464,7 @@
464464
"cell_type": "markdown",
465465
"metadata": {},
466466
"source": [
467-
"# Implementation\n",
467+
"## Implementation of SR\n",
468468
"\n",
469469
"The second part of this notebook goes deeper into the implementation of SR,\n",
470470
"and explores some subtleties that are not generally brought out in discussions of practical implementations. These subtleties might be summarized as\n",
@@ -474,7 +474,7 @@
474474
"\n",
475475
"Note that these details are independent of the quality of the random number generator (RNG) — all of the issues discussed here happen with perfect RNGs.\n",
476476
"\n",
477-
"## Case 0: Infinite-precision inputs and real-valued random variables\n",
477+
"### Case 0: Infinite-precision inputs and real-valued random variables\n",
478478
"\n",
479479
"To begin our discussion, let's start with \"high-school\" rounding,\n",
480480
"where we implement round-to-nearest with code like\n",
@@ -493,7 +493,7 @@
493493
"but needs to change subtly when they are supplied in fixed precision, as is true \n",
494494
"in a floating point system.\n",
495495
" \n",
496-
"## Case 1: Infinite-precision inputs and limited-precision random variables\n",
496+
"### Case 1: Infinite-precision inputs and limited-precision random variables\n",
497497
"\n",
498498
"Let's assume that `rand()` produces only `S` bits of randomness at every call,\n",
499499
"i.e. that its implementation is something like\n",
@@ -868,7 +868,7 @@
868868
"Good news. `SRFast` (the curve on the left) seems to have fixed things...\n",
869869
"What could be wrong? Why is that not the default?\n",
870870
"\n",
871-
"## Case 2: Finite-precision inputs and limited-precision random variables\n",
871+
"### Case 2: Finite-precision inputs and limited-precision random variables\n",
872872
"\n",
873873
"The answer is that we are still modelling the inputs `v` as being infinite precision (well, they are float64 here, but that's pretty much infinite precision).\n",
874874
"\n",

docs/source/formats.rst

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,71 @@ Defined Formats
55

66
.. module:: gfloat.formats
77

8+
Format parameters
9+
-----------------
10+
11+
This table (from example notebook :doc:`value-stats <02-value-stats>`) shows how
12+
gfloat has been used to tabulate properties of various floating point formats.
13+
14+
- name: Format
15+
- B: Bits in the format
16+
- P: Precision in bits
17+
- E: Exponent field width in bits
18+
- smallest: Smallest positive value
19+
- smallest_normal: Smallest positive normal value, n/a if no finite values are normal
20+
- max: Largest finite value
21+
- num_nans: Number of NaN values
22+
- num_infs: Number of infinities (2 or 0)
23+
24+
======== === === === =========== ================= ============ =========== ======
25+
name B P E smallest smallest_normal max num_nans infs
26+
======== === === === =========== ================= ============ =========== ======
27+
ocp_e2m1 4 2 2 0.5 1 6 0 0
28+
ocp_e2m3 6 4 2 0.125 1 7.5 0 0
29+
ocp_e3m2 6 3 3 0.0625 0.25 28 0 0
30+
ocp_e4m3 8 4 4 ≈0.0019531 0.015625 448 2 0
31+
ocp_e5m2 8 3 5 ≈1.5259e-05 ≈6.1035e-05 57344 6 2
32+
p3109_p1 8 1 7 ≈2.1684e-19 ≈2.1684e-19 ≈9.2234e+18 1 2
33+
p3109_p2 8 2 6 ≈2.3283e-10 ≈4.6566e-10 ≈2.1475e+09 1 2
34+
p3109_p3 8 3 5 ≈7.6294e-06 ≈3.0518e-05 49152 1 2
35+
p3109_p4 8 4 4 ≈0.00097656 0.0078125 224 1 2
36+
p3109_p5 8 5 3 0.0078125 0.125 15 1 2
37+
p3109_p6 8 6 2 0.015625 0.5 3.875 1 2
38+
binary16 16 11 5 ≈5.9605e-08 ≈6.1035e-05 65504 2046 2
39+
bfloat16 16 8 8 ≈9.1835e-41 ≈1.1755e-38 ≈3.3895e+38 254 2
40+
binary32 32 24 8 ≈1.4013e-45 ≈1.1755e-38 ≈3.4028e+38 ≈1.6777e+07 2
41+
binary64 64 53 11 4.9407e-324 ≈2.2251e-308 ≈1.7977e+308 ≈9.0072e+15 2
42+
ocp_e8m0 8 1 8 ≈5.8775e-39 ≈5.8775e-39 ≈1.7014e+38 1 0
43+
ocp_int8 8 8 0 0.015625 n/a ≈ 1.9844 0 0
44+
======== === === === =========== ================= ============ =========== ======
45+
46+
In the above table, values which are not exact are indicated with the "≈" symbol.
47+
And here's the same table, but with values which don't render exactly as short floats
48+
printed as rationals times powers of 2:
49+
50+
======== === === === =========== ================= ======================================== ====================================== ======
51+
name B P E smallest smallest_normal max num_nans infs
52+
======== === === === =========== ================= ======================================== ====================================== ======
53+
ocp_e2m1 4 2 2 0.5 1 6 0 0
54+
ocp_e2m3 6 4 2 0.125 1 7.5 0 0
55+
ocp_e3m2 6 3 3 0.0625 0.25 28 0 0
56+
ocp_e4m3 8 4 4 2^-9 0.015625 448 2 0
57+
ocp_e5m2 8 3 5 2^-16 2^-14 57344 6 2
58+
p3109_p1 8 1 7 2^-62 2^-62 2^63 1 2
59+
p3109_p2 8 2 6 2^-32 2^-31 2^31 1 2
60+
p3109_p3 8 3 5 2^-17 2^-15 49152 1 2
61+
p3109_p4 8 4 4 2^-10 0.0078125 224 1 2
62+
p3109_p5 8 5 3 0.0078125 0.125 15 1 2
63+
p3109_p6 8 6 2 0.015625 0.5 3.875 1 2
64+
binary16 16 11 5 2^-24 2^-14 65504 2046 2
65+
bfloat16 16 8 8 2^-133 2^-126 255/128*2^127 254 2
66+
binary32 32 24 8 2^-149 2^-126 16777215/8388608*2^127 8388607/4194304*2^23 2
67+
binary64 64 53 11 4.9407e-324 2^-1022 9007199254740991/9007199254740992*2^1024 4503599627370495/4503599627370496*2^53 2
68+
ocp_e8m0 8 1 8 2^-127 2^-127 2^127 1 0
69+
ocp_int8 8 8 0 0.015625 n/a 127/64*2^0 0 0
70+
======== === === === =========== ================= ======================================== ====================================== ======
71+
72+
873
IEEE 754 Formats
974
----------------
1075

docs/source/index.rst

Lines changed: 24 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,23 @@ GFloat: Generic floating point formats in Python
1010
================================================
1111

1212
GFloat is designed to allow experimentation with a variety of floating-point
13-
formats in Python. Formats are parameterized by the primary IEEE-754 parameters
14-
of:
13+
formats in Python. Headline features:
14+
15+
* A wide variety of floating point formats defined in :py:class:`gfloat.formats`
16+
17+
- IEEE 754, BFloat, OCP FP8 and MX, IEEE P3109
18+
19+
* Conversion between floats under numerous rounding modes
20+
21+
- Scalar code is optimized for readability
22+
- Array code is faster, and can operate on Numpy, JAX, or PyTorch arrays.
23+
24+
* Notebooks useful for teaching and exploring float formats
25+
26+
Provided Formats
27+
----------------
28+
29+
Formats are parameterized by the primary IEEE-754 parameters of:
1530

1631
* Width in bits (k)
1732
* Precision (p)
@@ -55,75 +70,13 @@ As well as block formats from |ocp_mx_link|.
5570
IEEE P3109
5671
</a>
5772

58-
Supported rounding modes include:
59-
60-
* Directed modes: Toward Zero, Toward Positive, Toward Negative
61-
* Round-to-nearest, with Ties to Even or Ties to Away
62-
* Stochastic rounding, with specified numbers of random bits
63-
64-
65-
Example
66-
-------
67-
This table (from example notebook :doc:`value-stats <02-value-stats>`) shows how
68-
gfloat has been used to tabulate properties of various floating point formats.
69-
70-
- name: Format
71-
- B: Bits in the format
72-
- P: Precision in bits
73-
- E: Exponent field width in bits
74-
- smallest: Smallest positive value
75-
- smallest_normal: Smallest positive normal value, n/a if no finite values are normal
76-
- max: Largest finite value
77-
- num_nans: Number of NaN values
78-
- num_infs: Number of infinities (2 or 0)
79-
80-
======== === === === =========== ================= ============ =========== ======
81-
name B P E smallest smallest_normal max num_nans infs
82-
======== === === === =========== ================= ============ =========== ======
83-
ocp_e2m1 4 2 2 0.5 1 6 0 0
84-
ocp_e2m3 6 4 2 0.125 1 7.5 0 0
85-
ocp_e3m2 6 3 3 0.0625 0.25 28 0 0
86-
ocp_e4m3 8 4 4 ≈0.0019531 0.015625 448 2 0
87-
ocp_e5m2 8 3 5 ≈1.5259e-05 ≈6.1035e-05 57344 6 2
88-
p3109_p1 8 1 7 ≈2.1684e-19 ≈2.1684e-19 ≈9.2234e+18 1 2
89-
p3109_p2 8 2 6 ≈2.3283e-10 ≈4.6566e-10 ≈2.1475e+09 1 2
90-
p3109_p3 8 3 5 ≈7.6294e-06 ≈3.0518e-05 49152 1 2
91-
p3109_p4 8 4 4 ≈0.00097656 0.0078125 224 1 2
92-
p3109_p5 8 5 3 0.0078125 0.125 15 1 2
93-
p3109_p6 8 6 2 0.015625 0.5 3.875 1 2
94-
binary16 16 11 5 ≈5.9605e-08 ≈6.1035e-05 65504 2046 2
95-
bfloat16 16 8 8 ≈9.1835e-41 ≈1.1755e-38 ≈3.3895e+38 254 2
96-
binary32 32 24 8 ≈1.4013e-45 ≈1.1755e-38 ≈3.4028e+38 ≈1.6777e+07 2
97-
binary64 64 53 11 4.9407e-324 ≈2.2251e-308 ≈1.7977e+308 ≈9.0072e+15 2
98-
ocp_e8m0 8 1 8 ≈5.8775e-39 ≈5.8775e-39 ≈1.7014e+38 1 0
99-
ocp_int8 8 8 0 0.015625 n/a ≈ 1.9844 0 0
100-
======== === === === =========== ================= ============ =========== ======
101-
102-
In the above table, values which are not exact are indicated with the "≈" symbol.
103-
And here's the same table, but with values which don't render exactly as short floats
104-
printed as rationals times powers of 2:
105-
106-
======== === === === =========== ================= ======================================== ====================================== ======
107-
name B P E smallest smallest_normal max num_nans infs
108-
======== === === === =========== ================= ======================================== ====================================== ======
109-
ocp_e2m1 4 2 2 0.5 1 6 0 0
110-
ocp_e2m3 6 4 2 0.125 1 7.5 0 0
111-
ocp_e3m2 6 3 3 0.0625 0.25 28 0 0
112-
ocp_e4m3 8 4 4 2^-9 0.015625 448 2 0
113-
ocp_e5m2 8 3 5 2^-16 2^-14 57344 6 2
114-
p3109_p1 8 1 7 2^-62 2^-62 2^63 1 2
115-
p3109_p2 8 2 6 2^-32 2^-31 2^31 1 2
116-
p3109_p3 8 3 5 2^-17 2^-15 49152 1 2
117-
p3109_p4 8 4 4 2^-10 0.0078125 224 1 2
118-
p3109_p5 8 5 3 0.0078125 0.125 15 1 2
119-
p3109_p6 8 6 2 0.015625 0.5 3.875 1 2
120-
binary16 16 11 5 2^-24 2^-14 65504 2046 2
121-
bfloat16 16 8 8 2^-133 2^-126 255/128*2^127 254 2
122-
binary32 32 24 8 2^-149 2^-126 16777215/8388608*2^127 8388607/4194304*2^23 2
123-
binary64 64 53 11 4.9407e-324 2^-1022 9007199254740991/9007199254740992*2^1024 4503599627370495/4503599627370496*2^53 2
124-
ocp_e8m0 8 1 8 2^-127 2^-127 2^127 1 0
125-
ocp_int8 8 8 0 0.015625 n/a 127/64*2^0 0 0
126-
======== === === === =========== ================= ======================================== ====================================== ======
73+
Rounding modes
74+
--------------
75+
76+
Various rounding modes:
77+
* Directed modes: Toward Zero, Toward Positive, Toward Negative
78+
* Round-to-nearest, with Ties to Even or Ties to Away
79+
* Stochastic rounding, with specified numbers of random bits
12780

12881

12982
See Also

src/gfloat/types.py

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,31 @@ class RoundMode(Enum):
88
"""
99
Enum for IEEE-754 rounding modes.
1010
11-
Result r is obtained from input v depending on rounding mode as follows
11+
Result :math:`r` is obtained from input :math:`v` depending on rounding mode as follows
12+
13+
Notes on stochastic rounding:
14+
15+
StochasticFast implements a stochastic rounding scheme that is unbiased in
16+
infinite precision, but biased when the quantity to be rounded is computed to
17+
a finite precision.
18+
19+
StochasticFastest implements a stochastic rounding scheme that is biased
20+
(the rounded value is on average farther from zero than the true value).
21+
22+
With a lot of SRbits (say 8 or more), these biases are negligible, and there
23+
may be some efficiency advantage in using StochasticFast or StochasticFastest.
24+
1225
"""
1326

14-
TowardZero = 1 #: :math:`\max \{ r ~ s.t. ~ |r| \le |v| \}`
15-
TowardNegative = 2 #: :math:`\max \{ r ~ s.t. ~ r \le v \}`
16-
TowardPositive = 3 #: :math:`\min \{ r ~ s.t. ~ r \ge v \}`
27+
TowardZero = 1 #: Return the largest :math:`r` such that :math:`|r| \le |v|`
28+
TowardNegative = 2 #: Return the largest :math:`r` such that :math:`r \le v`
29+
TowardPositive = 3 #: Return the smallest :math:`r` such that :math:`r \ge v`
1730
TiesToEven = 4 #: Round to nearest, ties to even
1831
TiesToAway = 5 #: Round to nearest, ties away from zero
19-
Stochastic = 6 #: Stochastic rounding
20-
StochasticFast = 7 #: Stochastic rounding - faster, but biased, see [Note 1].
21-
StochasticFastest = 8 #: Stochastic rounding - incorrect, see [Note 1].
22-
StochasticOdd = 9 #: Stochastic rounding, RTNO before comparison
23-
24-
25-
# [Note 1]:
26-
# StochasticFast implements a stochastic rounding scheme that is unbiased in
27-
# infinite precision, but biased when the quantity to be rounded is computed to
28-
# a finite precision.
29-
#
30-
# StochasticFastest implements a stochastic rounding scheme that is biased
31-
# (the rounded value is on average farther from zero than the true value).
32-
#
33-
# With a lot of SRbits (say 8 or more), these biases are negligible, and there
34-
# may be some efficiency advantage in using StochasticFast or StochasticFastest.
32+
Stochastic = 6 #: Stochastic rounding, RTNE before comparison
33+
StochasticOdd = 7 #: Stochastic rounding, RTNO before comparison
34+
StochasticFast = 8 #: Stochastic rounding - faster, but biased
35+
StochasticFastest = 9 #: Stochastic rounding - even faster, but more biased
3536

3637

3738
class FloatClass(Enum):

0 commit comments

Comments
 (0)