Skip to content

Commit 422a454

Browse files
committed
incorporate feedback from rkruppe and hsivonen
1 parent 8b69e2a commit 422a454

File tree

1 file changed

+48
-40
lines changed

1 file changed

+48
-40
lines changed

text/0000-ppv.md

Lines changed: 48 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -529,7 +529,8 @@ implementations:
529529
##### Integer vector semantics
530530

531531
The behavior of these operations for integer vectors is the same as that of the
532-
scalar integer types. That is: `panic!` on both overflow and division by zero.
532+
scalar integer types. That is: `panic!` on both overflow and division by zero in
533+
debug mode.
533534

534535
##### Floating-point semantics
535536

@@ -543,7 +544,7 @@ All signed and unsigned integer vector types implement the whole set of `pub fn
543544
wrapping_{add,sub,mul,div,rem}(self, Self) -> Self` methods which, on overflow,
544545
produce the correct mathematical result modulo `2^n`.
545546

546-
The `div` and `rem` method `panic!` on division by zero.
547+
The `div` and `rem` method `panic!` on division by zero in debug mode.
547548

548549
#### Unsafe wrapping arithmetic operations
549550

@@ -559,7 +560,7 @@ All portable signed integer, unsigned integer, and floating-point vectors
559560
implement the following methods:
560561

561562
```rust
562-
impl {element_type}{lane_width}x{number_of_lanes} {
563+
impl {element_type}{lane_width}x{number_of_lanes} {
563564
/// Lane-wise `min`.
564565
///
565566
/// Returns a vector whose lanes contain the smallest
@@ -828,19 +829,8 @@ AVX-512 vector types.
828829

829830
##### Semantics for floating-point numbers
830831

831-
* `eq`: yields `true` if both operands are not a `QNAN` and `self` is equal to
832-
`other`, yields `false` otherwise.
833-
* `gt`: yield `true` if both operands are not a `QNAN` and ``self`` is greater
834-
than `other`, yields `false` otherwise.
835-
* `ge`: yields `true` if both operands are not a `QNAN` and `self` is greater
836-
than or equal to `other`, yields `false` otherwise.
837-
* `lt`: yields `true` if both operands are not a `QNAN` and `self` is less than
838-
`other`, yields `false` otherwise.
839-
* `le`: yields `true` if both operands are not a `QNAN` and `self` is less than
840-
or equal to `other`, yields `false` otherwise.
841-
* `ne`: yields `true` if either operand is a `QNAN` or `self` is not equal to
842-
`other`, yields `false` otherwise.
843-
832+
The semantics of the lane-wise comparisons for floating point numbers are the
833+
same as in the scalar case.
844834

845835
### Portable vector shuffles
846836

@@ -850,20 +840,33 @@ std::simd::shuffle!(...);
850840
```
851841

852842
The `shuffle!` macro returns a new vector that contains a shuffle of the elements in
853-
one or two input vectors. That is, there are two versions:
843+
one or two input vectors. There are two versions:
844+
845+
* `shuffle!(vec, indices)`: one-vector version
846+
* `shuffle!(vec0, vec1, indices)`: two-vector version
847+
848+
with the following preconditions:
849+
850+
* `vec`, `vec0`, and `vec1` must be portable packed SIMD vector types.
851+
* `vec0` and `vec1` must have the same type.
852+
* `indices` must be a `const` array of type `[usize; N]` where `N` is any
853+
power-of-two in range `(0, 2 * {vec,vec0,vec1}::lanes()]`.
854+
* the values of `indices` must be in range `[0, N)` for the one-vector version,
855+
and in range `[0, 2N)` for the two-vector version.
856+
857+
On precondition violation a type error is produced.
854858

855-
* `shuffle!(vec, [indices...])`: one-vector version
856-
* `shuffle!(vec0, vec1, [indices...])`: two-vector version
859+
The macro returns a new vector whose:
857860

858-
In the two-vector version, both `vec0` and `vec1` must have the same type.
859-
The element type of the resulting vector is the element type of the input
860-
vector.
861+
* element type equals that of the input vectors,
862+
* length equals `N`, that is, the length of the `indices` array
861863

862-
The number of `indices` must be a power-of-two in range `[0, 64]` no longer
863-
than two times the number of lanes in the input vector. The length of the
864-
resulting vector equals the number of indices provided.
864+
The `i`-th element of `indices` with value `j` in range `[0, N)` stores the
865+
`j`-th element of the first vector into the `i`-th element of the result vector.
865866

866-
Given a vector with `N` lanes, the indices in range `[0, N)` refer to the `N` elements in the vector. In the two-vector version, the indices in range `[N, 2*N)` refer to elements in the second vector.
867+
In the two-vector version, the `i`-th element of `indices` with value `j` in
868+
range `[N, 2N)` stores the `j - N`-th element of the second vector into the
869+
`i`-th element of the result vector.
867870

868871
#### Example: shuffles
869872

@@ -990,19 +993,21 @@ This RFC requires backends to provide generic vector types. Most backends suppor
990993
this in one form or another, but if one future backend does not, this RFC can be
991994
implemented on top of the architecture specific types.
992995

993-
## Zero-overhead requirement for backends
996+
## Achieving zero-overhead is outside Rust's control
994997

995998
A future architecture might have an instruction that performs multiple
996999
operations exposed by this API in one go, like `(a + b).wrapping_sum()` on an
997-
`f32x4` vector. The zero-overhead requirement makes it a bug if Rust does not
998-
generate optimal code for this situation.
1000+
`f32x4` vector. If that expression does not produce optimal machine code, Rust
1001+
has a performance bug.
9991002

10001003
This is not a performance bug that can be easily worked around in `stdsimd` or
1001-
`rustc`, making this almost certainly a performance bug in the backend.
1004+
`rustc`, making this, almost certainly, a performance bug in the backend. These
1005+
performance bugs can be arbitrarily hard to fix, and fixing these might not
1006+
always be worth it.
10021007

1003-
It is reasonable to assume that every optimizing Rust backed will have a
1004-
pattern-matching engine powerful enough to perform these
1005-
transformations, but it is worth it to keep this requirement in mind.
1008+
That is, while these APIs should make it possible for reasonably-designed
1009+
optimizing Rust backends to achieve zero-overhead, zero-overhead can only be
1010+
provided in practice on a best-effort basis.
10061011

10071012
## Performance of this API might vary dramatically
10081013

@@ -1266,11 +1271,11 @@ The vector types proposed in this RFC are packed, that is, their size is fixed
12661271
at compile-time.
12671272

12681273
Many modern architectures support vector operations of run-time size, often
1269-
called scalable Vectors or scalable vectors. These include, amongst others, NecSX,
1270-
ARM SVE, RISC-V Vectors. These architectures have traditionally relied on
1271-
auto-vectorization combined with support for explicit vectorization annotations,
1272-
but newer architectures like ARM SVE and RISC-V introduce explicit vectorization
1273-
intrinsics.
1274+
called scalable Vectors or scalable vectors. These include, amongst others,
1275+
NecSX, ARM SVE, and RISC-V's Vector Extension Proposal. These architectures have
1276+
traditionally relied on auto-vectorization combined with support for explicit
1277+
vectorization annotations, but newer architectures like ARM SVE introduce
1278+
explicit vectorization intrinsics.
12741279

12751280
This is an example adapted from this [ARM SVE
12761281
paper](https://developer.arm.com/hpc/arm-scalable-vector-extensions-and-application-to-machine-learning)
@@ -1307,8 +1312,11 @@ fn add_constant(dst: &mut [f64], src: &[f64], c: f64) {
13071312
}
13081313
```
13091314

1310-
RISC-V proposes a model similar in spirit, but not identical to the ARM SVE one.
1311-
It would not be surprising if other popular architectures offered similar but not necessarily identical explicit vectorization models for scalable vectors in the future.
1315+
The RISC-V vector extension proposal introduces a model similar in spirit to ARM
1316+
SVE. These extensions are, however, not official yet, and it is currently
1317+
unknown whether GCC and LLVM will expose explicit intrinsics for them. It would
1318+
not be surprising if they do, and it would not be surprising if similar scalable
1319+
vector extensions are introduced in other architectures in the future.
13121320

13131321
The main differences between scalable and portable vectors are that:
13141322

0 commit comments

Comments
 (0)