You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/// Returns a vector whose lanes contain the smallest
@@ -828,19 +829,8 @@ AVX-512 vector types.
828
829
829
830
##### Semantics for floating-point numbers
830
831
831
-
*`eq`: yields `true` if both operands are not a `QNAN` and `self` is equal to
832
-
`other`, yields `false` otherwise.
833
-
*`gt`: yield `true` if both operands are not a `QNAN` and ``self`` is greater
834
-
than `other`, yields `false` otherwise.
835
-
*`ge`: yields `true` if both operands are not a `QNAN` and `self` is greater
836
-
than or equal to `other`, yields `false` otherwise.
837
-
*`lt`: yields `true` if both operands are not a `QNAN` and `self` is less than
838
-
`other`, yields `false` otherwise.
839
-
*`le`: yields `true` if both operands are not a `QNAN` and `self` is less than
840
-
or equal to `other`, yields `false` otherwise.
841
-
*`ne`: yields `true` if either operand is a `QNAN` or `self` is not equal to
842
-
`other`, yields `false` otherwise.
843
-
832
+
The semantics of the lane-wise comparisons for floating point numbers are the
833
+
same as in the scalar case.
844
834
845
835
### Portable vector shuffles
846
836
@@ -850,20 +840,33 @@ std::simd::shuffle!(...);
850
840
```
851
841
852
842
The `shuffle!` macro returns a new vector that contains a shuffle of the elements in
853
-
one or two input vectors. That is, there are two versions:
843
+
one or two input vectors. There are two versions:
844
+
845
+
*`shuffle!(vec, indices)`: one-vector version
846
+
*`shuffle!(vec0, vec1, indices)`: two-vector version
847
+
848
+
with the following preconditions:
849
+
850
+
*`vec`, `vec0`, and `vec1` must be portable packed SIMD vector types.
851
+
*`vec0` and `vec1` must have the same type.
852
+
*`indices` must be a `const` array of type `[usize; N]` where `N` is any
853
+
power-of-two in range `(0, 2 * {vec,vec0,vec1}::lanes()]`.
854
+
* the values of `indices` must be in range `[0, N)` for the one-vector version,
855
+
and in range `[0, 2N)` for the two-vector version.
856
+
857
+
On precondition violation a type error is produced.
854
858
855
-
*`shuffle!(vec, [indices...])`: one-vector version
856
-
*`shuffle!(vec0, vec1, [indices...])`: two-vector version
859
+
The macro returns a new vector whose:
857
860
858
-
In the two-vector version, both `vec0` and `vec1` must have the same type.
859
-
The element type of the resulting vector is the element type of the input
860
-
vector.
861
+
* element type equals that of the input vectors,
862
+
* length equals `N`, that is, the length of the `indices` array
861
863
862
-
The number of `indices` must be a power-of-two in range `[0, 64]` no longer
863
-
than two times the number of lanes in the input vector. The length of the
864
-
resulting vector equals the number of indices provided.
864
+
The `i`-th element of `indices` with value `j` in range `[0, N)` stores the
865
+
`j`-th element of the first vector into the `i`-th element of the result vector.
865
866
866
-
Given a vector with `N` lanes, the indices in range `[0, N)` refer to the `N` elements in the vector. In the two-vector version, the indices in range `[N, 2*N)` refer to elements in the second vector.
867
+
In the two-vector version, the `i`-th element of `indices` with value `j` in
868
+
range `[N, 2N)` stores the `j - N`-th element of the second vector into the
869
+
`i`-th element of the result vector.
867
870
868
871
#### Example: shuffles
869
872
@@ -990,19 +993,21 @@ This RFC requires backends to provide generic vector types. Most backends suppor
990
993
this in one form or another, but if one future backend does not, this RFC can be
991
994
implemented on top of the architecture specific types.
992
995
993
-
## Zero-overhead requirement for backends
996
+
## Achieving zero-overhead is outside Rust's control
994
997
995
998
A future architecture might have an instruction that performs multiple
996
999
operations exposed by this API in one go, like `(a + b).wrapping_sum()` on an
997
-
`f32x4` vector. The zero-overhead requirement makes it a bug if Rust does not
998
-
generate optimal code for this situation.
1000
+
`f32x4` vector. If that expression does not produce optimal machine code, Rust
1001
+
has a performance bug.
999
1002
1000
1003
This is not a performance bug that can be easily worked around in `stdsimd` or
1001
-
`rustc`, making this almost certainly a performance bug in the backend.
1004
+
`rustc`, making this, almost certainly, a performance bug in the backend. These
1005
+
performance bugs can be arbitrarily hard to fix, and fixing these might not
1006
+
always be worth it.
1002
1007
1003
-
It is reasonable to assume that every optimizing Rust backed will have a
1004
-
pattern-matching engine powerful enough to perform these
1005
-
transformations, but it is worth it to keep this requirement in mind.
1008
+
That is, while these APIs should make it possible for reasonably-designed
1009
+
optimizing Rust backends to achieve zero-overhead, zero-overhead can only be
1010
+
provided in practice on a best-effort basis.
1006
1011
1007
1012
## Performance of this API might vary dramatically
1008
1013
@@ -1266,11 +1271,11 @@ The vector types proposed in this RFC are packed, that is, their size is fixed
1266
1271
at compile-time.
1267
1272
1268
1273
Many modern architectures support vector operations of run-time size, often
1269
-
called scalable Vectors or scalable vectors. These include, amongst others, NecSX,
1270
-
ARM SVE, RISC-V Vectors. These architectures have traditionally relied on
1271
-
auto-vectorization combined with support for explicit vectorization annotations,
1272
-
but newer architectures like ARM SVE and RISC-V introduce explicit vectorization
1273
-
intrinsics.
1274
+
called scalable Vectors or scalable vectors. These include, amongst others,
1275
+
NecSX, ARM SVE, and RISC-V's Vector Extension Proposal. These architectures have
1276
+
traditionally relied on auto-vectorization combined with support for explicit
1277
+
vectorization annotations, but newer architectures like ARM SVE introduce
RISC-V proposes a model similar in spirit, but not identical to the ARM SVE one.
1311
-
It would not be surprising if other popular architectures offered similar but not necessarily identical explicit vectorization models for scalable vectors in the future.
1315
+
The RISC-V vector extension proposal introduces a model similar in spirit to ARM
1316
+
SVE. These extensions are, however, not official yet, and it is currently
1317
+
unknown whether GCC and LLVM will expose explicit intrinsics for them. It would
1318
+
not be surprising if they do, and it would not be surprising if similar scalable
1319
+
vector extensions are introduced in other architectures in the future.
1312
1320
1313
1321
The main differences between scalable and portable vectors are that:
0 commit comments