Add comment to vector_mask_to_bitmask explaining why shifting before truncation is beneficial

jhorstmann · jhorstmann · commit 4fdf24890e64 · 2024-03-01T16:34:20.000+01:00
diff --git a/compiler/rustc_codegen_llvm/src/intrinsic.rs b/compiler/rustc_codegen_llvm/src/intrinsic.rs
@@ -965,6 +965,19 @@ fn generic_simd_intrinsic<'ll, 'tcx>(
         }};
     }
 
+    /// Converts a vector mask, where each element has a bit width equal to the data elements it is used with,
+    /// down to an i1 based mask that can be used by llvm intrinsics.
+    ///
+    /// The rust simd semantics are that each element should either consist of all ones or all zeroes,
+    /// but this information is not available to llvm. Truncating the vector effectively uses the lowest bit,
+    /// but codegen for several targets is better if we consider the highest bit by shifting.
+    ///
+    /// For x86 SSE/AVX targets this is beneficial since most instructions with mask parameters only consider the highest bit.
+    /// So even though on llvm level we have an additional shift, in the final assembly there is no shift or truncate and
+    /// instead the mask can be used as is.
+    ///
+    /// For aarch64 and other targets there is a benefit because a mask from the sign bit can be more
+    /// efficiently converted to an all ones / all zeroes mask by comparing whether each element is negative.
     fn vector_mask_to_bitmask<'a, 'll, 'tcx>(
         bx: &mut Builder<'a, 'll, 'tcx>,
         i_xn: &'ll Value,