incorporate feedback from rkruppe and hsivonen

gnzlbg · gnzlbg · commit 422a4547be6c · 2018-03-23T10:05:56.000+01:00
diff --git a/text/0000-ppv.md b/text/0000-ppv.md
@@ -529,7 +529,8 @@ implementations:
 ##### Integer vector semantics
 
 The behavior of these operations for integer vectors is the same as that of the
-scalar integer types. That is: `panic!` on both overflow and division by zero.
+scalar integer types. That is: `panic!` on both overflow and division by zero in
+debug mode.
   
 ##### Floating-point semantics
 
@@ -543,7 +544,7 @@ All signed and unsigned integer vector types implement the whole set of `pub fn
 wrapping_{add,sub,mul,div,rem}(self, Self) -> Self` methods which, on overflow,
 produce the correct mathematical result modulo `2^n`.
 
-The `div` and `rem` method `panic!` on division by zero.
+The `div` and `rem` method `panic!` on division by zero in debug mode.
 
 #### Unsafe wrapping arithmetic operations
 
@@ -559,7 +560,7 @@ All portable signed integer, unsigned integer, and floating-point vectors
 implement the following methods:
 
 ```rust
-impl {element_type}{lane_width}x{number_of_lanes} {   
+impl {element_type}{lane_width}x{number_of_lanes} {
 /// Lane-wise `min`.
 ///
 /// Returns a vector whose lanes contain the smallest 
@@ -828,19 +829,8 @@ AVX-512 vector types.
 
 ##### Semantics for floating-point numbers
 
-* `eq`: yields `true` if both operands are not a `QNAN` and `self` is equal to
-  `other`, yields `false` otherwise.
-* `gt`: yield `true` if both operands are not a `QNAN` and ``self`` is greater
-  than `other`, yields `false` otherwise.
-* `ge`: yields `true` if both operands are not a `QNAN` and `self` is greater
-  than or equal to `other`, yields `false` otherwise.
-* `lt`: yields `true` if both operands are not a `QNAN` and `self` is less than
-  `other`, yields `false` otherwise.
-* `le`: yields `true` if both operands are not a `QNAN` and `self` is less than
-  or equal to `other`, yields `false` otherwise.
-* `ne`: yields `true` if either operand is a `QNAN` or `self` is not equal to
-  `other`, yields `false` otherwise.
-  
+The semantics of the lane-wise comparisons for floating point numbers are the
+same as in the scalar case. 
   
 ### Portable vector shuffles
 
@@ -850,20 +840,33 @@ std::simd::shuffle!(...);
 ```
 
 The `shuffle!` macro returns a new vector that contains a shuffle of the elements in
-one or two input vectors. That is, there are two versions:
+one or two input vectors. There are two versions:
+
+ * `shuffle!(vec, indices)`: one-vector version
+ * `shuffle!(vec0, vec1, indices)`: two-vector version
+
+with the following preconditions:
+
+ * `vec`, `vec0`, and `vec1` must be portable packed SIMD vector types.
+ * `vec0` and `vec1` must have the same type. 
+ * `indices` must be a `const` array of type `[usize; N]` where `N` is any
+   power-of-two in range `(0, 2 * {vec,vec0,vec1}::lanes()]`.
+ * the values of `indices` must be in range `[0, N)` for the one-vector version,
+   and in range `[0, 2N)` for the two-vector version.
+   
+On precondition violation a type error is produced.
 
- * `shuffle!(vec, [indices...])`: one-vector version
- * `shuffle!(vec0, vec1, [indices...])`: two-vector version
+The macro returns a new vector whose:
 
-In the two-vector version, both `vec0` and `vec1` must have the same type.
-The element type of the resulting vector is the element type of the input
-vector.
+* element type equals that of the input vectors, 
+* length equals `N`, that is, the length of the `indices` array
 
-The number of `indices` must be a power-of-two in range `[0, 64]` no longer
-than two times the number of lanes in the input vector. The length of the
-resulting vector equals the number of indices provided.
+The `i`-th element of `indices` with value `j` in range `[0, N)` stores the
+`j`-th element of the first vector into the `i`-th element of the result vector.
 
-Given a vector with `N` lanes, the indices in range `[0, N)` refer to the `N` elements in the vector. In the two-vector version, the indices in range `[N, 2*N)` refer to elements in the second vector.
+In the two-vector version, the `i`-th element of `indices` with value `j` in
+range `[N, 2N)` stores the `j - N`-th element of the second vector into the
+`i`-th element of the result vector.
 
 #### Example: shuffles
 
@@ -990,19 +993,21 @@ This RFC requires backends to provide generic vector types. Most backends suppor
 this in one form or another, but if one future backend does not, this RFC can be
 implemented on top of the architecture specific types.
 
-## Zero-overhead requirement for backends
+## Achieving zero-overhead is outside Rust's control
 
 A future architecture might have an instruction that performs multiple
 operations exposed by this API in one go, like `(a + b).wrapping_sum()` on an
-`f32x4` vector. The zero-overhead requirement makes it a bug if Rust does not
-generate optimal code for this situation.
+`f32x4` vector. If that expression does not produce optimal machine code, Rust
+has a performance bug.
 
 This is not a performance bug that can be easily worked around in `stdsimd` or
-`rustc`, making this almost certainly a performance bug in the backend.
+`rustc`, making this, almost certainly, a performance bug in the backend. These
+performance bugs can be arbitrarily hard to fix, and fixing these might not
+always be worth it.
 
-It is reasonable to assume that every optimizing Rust backed will have a
-pattern-matching engine powerful enough to perform these
-transformations, but it is worth it to keep this requirement in mind.
+That is, while these APIs should make it possible for reasonably-designed
+optimizing Rust backends to achieve zero-overhead, zero-overhead can only be
+provided in practice on a best-effort basis.
 
 ## Performance of this API might vary dramatically
 
@@ -1266,11 +1271,11 @@ The vector types proposed in this RFC are packed, that is, their size is fixed
 at compile-time.
 
 Many modern architectures support vector operations of run-time size, often
-called scalable Vectors or scalable vectors. These include, amongst others, NecSX,
-ARM SVE, RISC-V Vectors. These architectures have traditionally relied on
-auto-vectorization combined with support for explicit vectorization annotations,
-but newer architectures like ARM SVE and RISC-V introduce explicit vectorization
-intrinsics. 
+called scalable Vectors or scalable vectors. These include, amongst others,
+NecSX, ARM SVE, and RISC-V's Vector Extension Proposal. These architectures have
+traditionally relied on auto-vectorization combined with support for explicit
+vectorization annotations, but newer architectures like ARM SVE introduce
+explicit vectorization intrinsics.
 
 This is an example adapted from this [ARM SVE
 paper](https://developer.arm.com/hpc/arm-scalable-vector-extensions-and-application-to-machine-learning)
@@ -1307,8 +1312,11 @@ fn add_constant(dst: &mut [f64], src: &[f64], c: f64) {
 }
 ```
 
-RISC-V proposes a model similar in spirit, but not identical to the ARM SVE one.
-It would not be surprising if other popular architectures offered similar but not necessarily identical explicit vectorization models for scalable vectors in the future.
+The RISC-V vector extension proposal introduces a model similar in spirit to ARM
+SVE. These extensions are, however, not official yet, and it is currently
+unknown whether GCC and LLVM will expose explicit intrinsics for them. It would
+not be surprising if they do, and it would not be surprising if similar scalable
+vector extensions are introduced in other architectures in the future.
 
 The main differences between scalable and portable vectors are that: