[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent #147677

lukel97 · 2025-07-09T09:34:47Z

Previously we only carried the type arguments which caused value-based costs to be inadvertantly changed into type-based costs.

I'm just using vp.is.fpclass as an example intrinsic for now since the type based cost seems to differ from the value based cost, and most normal intrinsics e.g. min/max have the same value + type based cost.

We still need to handle the cost properly for is.fpclass in a second patch.

This is needed for an upcoming patch to handle the cost of llvm.experimental.vp.reverse which suffers from the same problem.

Previously we only carried the type arguments which caused value-based costs to be inadvertantly changed into type-based costs. I'm just using vp.is.fpclass as an example intrinsic for now since the type based cost seems to differ from the value based cost, and most normal intrinsics e.g. min/max have the same value + type based cost. We still need to handle the cost properly for is.fpclass in a second patch. This is needed for an upcoming patch to handle the cost of llvm.experimental.vp.reverse which suffers from the same problem.

llvmbot · 2025-07-09T09:35:22Z

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

Previously we only carried the type arguments which caused value-based costs to be inadvertantly changed into type-based costs.

I'm just using vp.is.fpclass as an example intrinsic for now since the type based cost seems to differ from the value based cost, and most normal intrinsics e.g. min/max have the same value + type based cost.

We still need to handle the cost properly for is.fpclass in a second patch.

This is needed for an upcoming patch to handle the cost of llvm.experimental.vp.reverse which suffers from the same problem.

Full diff: https://github.com/llvm/llvm-project/pull/147677.diff

2 Files Affected:

(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+10-3)
(modified) llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll (+106)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 2b9be43eadb7a..21da0311fa434 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1781,6 +1781,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
         assert(ICA.getArgTypes().size() >= 2 &&
                "Expected VPIntrinsic to have Mask and Vector Length args and "
                "types");
+
+        ArrayRef<const Value *> NewArgs = ArrayRef(ICA.getArgs());
+        if (!ICA.isTypeBasedOnly())
+          NewArgs = NewArgs.drop_back(2);
         ArrayRef<Type *> NewTys = ArrayRef(ICA.getArgTypes()).drop_back(2);
 
         // VPReduction intrinsics have a start value argument that their non-vp
@@ -1788,11 +1792,14 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
         // counterpart.
         if (VPReductionIntrinsic::isVPReduction(ICA.getID()) &&
             *FID != Intrinsic::vector_reduce_fadd &&
-            *FID != Intrinsic::vector_reduce_fmul)
+            *FID != Intrinsic::vector_reduce_fmul) {
+          if (!ICA.isTypeBasedOnly())
+            NewArgs = NewArgs.drop_front();
           NewTys = NewTys.drop_front();
+        }
 
-        IntrinsicCostAttributes NewICA(*FID, ICA.getReturnType(), NewTys,
-                                       ICA.getFlags());
+        IntrinsicCostAttributes NewICA(*FID, ICA.getReturnType(), NewArgs,
+                                       NewTys, ICA.getFlags());
         return thisT()->getIntrinsicInstrCost(NewICA, CostKind);
       }
     }
diff --git a/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
index ea3c47dc34201..ad239a511d747 100644
--- a/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll
@@ -1648,3 +1648,109 @@ define void @splice() {
   %splice_nxv2i1 = call <vscale x 2 x i1> @llvm.experimental.vp.splice.nxv2i1(<vscale x 2 x i1> zeroinitializer, <vscale x 2 x i1> zeroinitializer, i32 1, <vscale x 2 x i1> zeroinitializer, i32 poison, i32 poison)
   ret void
 }
+
+define void @is.fpclass() {
+; ARGBASED-LABEL: 'is.fpclass'
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %1 = call <2 x i1> @llvm.vp.is.fpclass.v2bf16(<2 x bfloat> undef, i32 0, <2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %2 = call <4 x i1> @llvm.vp.is.fpclass.v4bf16(<4 x bfloat> undef, i32 0, <4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %3 = call <8 x i1> @llvm.vp.is.fpclass.v8bf16(<8 x bfloat> undef, i32 0, <8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %4 = call <16 x i1> @llvm.vp.is.fpclass.v16bf16(<16 x bfloat> undef, i32 0, <16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %5 = call <2 x i1> @llvm.vp.is.fpclass.v2f16(<2 x half> undef, i32 0, <2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %6 = call <4 x i1> @llvm.vp.is.fpclass.v4f16(<4 x half> undef, i32 0, <4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %7 = call <8 x i1> @llvm.vp.is.fpclass.v8f16(<8 x half> undef, i32 0, <8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %8 = call <16 x i1> @llvm.vp.is.fpclass.v16f16(<16 x half> undef, i32 0, <16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %9 = call <2 x i1> @llvm.vp.is.fpclass.v2f32(<2 x float> undef, i32 0, <2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %10 = call <4 x i1> @llvm.vp.is.fpclass.v4f32(<4 x float> undef, i32 0, <4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %11 = call <8 x i1> @llvm.vp.is.fpclass.v8f32(<8 x float> undef, i32 0, <8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %12 = call <16 x i1> @llvm.vp.is.fpclass.v16f32(<16 x float> undef, i32 0, <16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %13 = call <2 x i1> @llvm.vp.is.fpclass.v2f64(<2 x double> undef, i32 0, <2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %14 = call <4 x i1> @llvm.vp.is.fpclass.v4f64(<4 x double> undef, i32 0, <4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 18 for instruction: %15 = call <8 x i1> @llvm.vp.is.fpclass.v8f64(<8 x double> undef, i32 0, <8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: %16 = call <16 x i1> @llvm.vp.is.fpclass.v16f64(<16 x double> undef, i32 0, <16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %17 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2bf16(<vscale x 2 x bfloat> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %18 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4bf16(<vscale x 4 x bfloat> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %19 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8bf16(<vscale x 8 x bfloat> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %20 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16bf16(<vscale x 16 x bfloat> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %21 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f16(<vscale x 2 x half> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %22 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f16(<vscale x 4 x half> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %23 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f16(<vscale x 8 x half> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %24 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f16(<vscale x 16 x half> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %25 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f32(<vscale x 2 x float> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %26 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f32(<vscale x 4 x float> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %27 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f32(<vscale x 8 x float> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %28 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f32(<vscale x 16 x float> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %29 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f64(<vscale x 2 x double> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %30 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f64(<vscale x 4 x double> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %31 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f64(<vscale x 8 x double> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Invalid cost for instruction: %32 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f64(<vscale x 16 x double> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; ARGBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; TYPEBASED-LABEL: 'is.fpclass'
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = call <2 x i1> @llvm.vp.is.fpclass.v2bf16(<2 x bfloat> undef, i32 0, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %2 = call <4 x i1> @llvm.vp.is.fpclass.v4bf16(<4 x bfloat> undef, i32 0, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %3 = call <8 x i1> @llvm.vp.is.fpclass.v8bf16(<8 x bfloat> undef, i32 0, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 65 for instruction: %4 = call <16 x i1> @llvm.vp.is.fpclass.v16bf16(<16 x bfloat> undef, i32 0, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %5 = call <2 x i1> @llvm.vp.is.fpclass.v2f16(<2 x half> undef, i32 0, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %6 = call <4 x i1> @llvm.vp.is.fpclass.v4f16(<4 x half> undef, i32 0, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %7 = call <8 x i1> @llvm.vp.is.fpclass.v8f16(<8 x half> undef, i32 0, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 65 for instruction: %8 = call <16 x i1> @llvm.vp.is.fpclass.v16f16(<16 x half> undef, i32 0, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %9 = call <2 x i1> @llvm.vp.is.fpclass.v2f32(<2 x float> undef, i32 0, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %10 = call <4 x i1> @llvm.vp.is.fpclass.v4f32(<4 x float> undef, i32 0, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %11 = call <8 x i1> @llvm.vp.is.fpclass.v8f32(<8 x float> undef, i32 0, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 65 for instruction: %12 = call <16 x i1> @llvm.vp.is.fpclass.v16f32(<16 x float> undef, i32 0, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %13 = call <2 x i1> @llvm.vp.is.fpclass.v2f64(<2 x double> undef, i32 0, <2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %14 = call <4 x i1> @llvm.vp.is.fpclass.v4f64(<4 x double> undef, i32 0, <4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %15 = call <8 x i1> @llvm.vp.is.fpclass.v8f64(<8 x double> undef, i32 0, <8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 65 for instruction: %16 = call <16 x i1> @llvm.vp.is.fpclass.v16f64(<16 x double> undef, i32 0, <16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %17 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2bf16(<vscale x 2 x bfloat> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %18 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4bf16(<vscale x 4 x bfloat> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %19 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8bf16(<vscale x 8 x bfloat> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %20 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16bf16(<vscale x 16 x bfloat> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %21 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f16(<vscale x 2 x half> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %22 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f16(<vscale x 4 x half> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %23 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f16(<vscale x 8 x half> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %24 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f16(<vscale x 16 x half> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %25 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f32(<vscale x 2 x float> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %26 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f32(<vscale x 4 x float> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %27 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f32(<vscale x 8 x float> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %28 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f32(<vscale x 16 x float> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %29 = call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f64(<vscale x 2 x double> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %30 = call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f64(<vscale x 4 x double> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %31 = call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f64(<vscale x 8 x double> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Invalid cost for instruction: %32 = call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f64(<vscale x 16 x double> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+; TYPEBASED-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+  call <2 x i1> @llvm.vp.is.fpclass.v2bf16(<2 x bfloat> undef, i32 0, <2 x i1> undef, i32 undef)
+  call <4 x i1> @llvm.vp.is.fpclass.v4bf16(<4 x bfloat> undef, i32 0, <4 x i1> undef, i32 undef)
+  call <8 x i1> @llvm.vp.is.fpclass.v8bf16(<8 x bfloat> undef, i32 0, <8 x i1> undef, i32 undef)
+  call <16 x i1> @llvm.vp.is.fpclass.v16bf16(<16 x bfloat> undef, i32 0, <16 x i1> undef, i32 undef)
+  call <2 x i1> @llvm.vp.is.fpclass.v2f16(<2 x half> undef, i32 0, <2 x i1> undef, i32 undef)
+  call <4 x i1> @llvm.vp.is.fpclass.v4f16(<4 x half> undef, i32 0, <4 x i1> undef, i32 undef)
+  call <8 x i1> @llvm.vp.is.fpclass.v8f16(<8 x half> undef, i32 0, <8 x i1> undef, i32 undef)
+  call <16 x i1> @llvm.vp.is.fpclass.v16f16(<16 x half> undef, i32 0, <16 x i1> undef, i32 undef)
+  call <2 x i1> @llvm.vp.is.fpclass.v2f32(<2 x float> undef, i32 0, <2 x i1> undef, i32 undef)
+  call <4 x i1> @llvm.vp.is.fpclass.v4f32(<4 x float> undef, i32 0, <4 x i1> undef, i32 undef)
+  call <8 x i1> @llvm.vp.is.fpclass.v8f32(<8 x float> undef, i32 0, <8 x i1> undef, i32 undef)
+  call <16 x i1> @llvm.vp.is.fpclass.v16f32(<16 x float> undef, i32 0, <16 x i1> undef, i32 undef)
+  call <2 x i1> @llvm.vp.is.fpclass.v2f64(<2 x double> undef, i32 0, <2 x i1> undef, i32 undef)
+  call <4 x i1> @llvm.vp.is.fpclass.v4f64(<4 x double> undef, i32 0, <4 x i1> undef, i32 undef)
+  call <8 x i1> @llvm.vp.is.fpclass.v8f64(<8 x double> undef, i32 0, <8 x i1> undef, i32 undef)
+  call <16 x i1> @llvm.vp.is.fpclass.v16f64(<16 x double> undef, i32 0, <16 x i1> undef, i32 undef)
+  call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2bf16(<vscale x 2 x bfloat> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+  call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4bf16(<vscale x 4 x bfloat> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+  call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8bf16(<vscale x 8 x bfloat> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+  call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16bf16(<vscale x 16 x bfloat> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+  call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f16(<vscale x 2 x half> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+  call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f16(<vscale x 4 x half> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+  call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f16(<vscale x 8 x half> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+  call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f16(<vscale x 16 x half> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+  call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f32(<vscale x 2 x float> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+  call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f32(<vscale x 4 x float> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+  call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f32(<vscale x 8 x float> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+  call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f32(<vscale x 16 x float> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+  call <vscale x 2 x i1> @llvm.vp.is.fpclass.nxv2f64(<vscale x 2 x double> undef, i32 0, <vscale x 2 x i1> undef, i32 undef)
+  call <vscale x 4 x i1> @llvm.vp.is.fpclass.nxv4f64(<vscale x 4 x double> undef, i32 0, <vscale x 4 x i1> undef, i32 undef)
+  call <vscale x 8 x i1> @llvm.vp.is.fpclass.nxv8f64(<vscale x 8 x double> undef, i32 0, <vscale x 8 x i1> undef, i32 undef)
+  call <vscale x 16 x i1> @llvm.vp.is.fpclass.nxv16f64(<vscale x 16 x double> undef, i32 0, <vscale x 16 x i1> undef, i32 undef)
+  ret void
+}

github-actions · 2025-07-09T09:37:46Z

✅ With the latest revision this PR passed the undef deprecator.

artagnon

I'll get to this in a bit, but first, do a s/undef/poison/?

artagnon

Confused; questions follow.

artagnon · 2025-07-09T21:23:58Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

        if (VPReductionIntrinsic::isVPReduction(ICA.getID()) &&
            *FID != Intrinsic::vector_reduce_fadd &&
-            *FID != Intrinsic::vector_reduce_fmul)
+            *FID != Intrinsic::vector_reduce_fmul) {
+          if (!ICA.isTypeBasedOnly())
+            NewArgs = NewArgs.drop_front();
          NewTys = NewTys.drop_front();
+        }


Is this code covered? Maybe I'm missing something, but isn't vp.is.fpclass a non-reduction VP intrinsic?

Yeah it's covered by the existing tests at llvm/test/Analysis/CostModel/RISCV/reduce-fadd/fmul.ll. It just so happens that there's no test diff with that change because the type-based and value-based costing is the same for these intrinsics, i.e. here's the code for the value costing path:

case Intrinsic::vector_reduce_add: case Intrinsic::vector_reduce_mul: case Intrinsic::vector_reduce_and: case Intrinsic::vector_reduce_or: case Intrinsic::vector_reduce_xor: case Intrinsic::vector_reduce_smax: case Intrinsic::vector_reduce_smin: case Intrinsic::vector_reduce_fmax: case Intrinsic::vector_reduce_fmin: case Intrinsic::vector_reduce_fmaximum: case Intrinsic::vector_reduce_fminimum: case Intrinsic::vector_reduce_umax: case Intrinsic::vector_reduce_umin: { IntrinsicCostAttributes Attrs(IID, RetTy, Args[0]->getType(), FMF, I, 1); return getTypeBasedIntrinsicInstrCost(Attrs, CostKind); }

artagnon · 2025-07-09T21:25:20Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

@@ -1781,18 +1781,25 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
        assert(ICA.getArgTypes().size() >= 2 &&
               "Expected VPIntrinsic to have Mask and Vector Length args and "
               "types");
+
+        ArrayRef<const Value *> NewArgs = ArrayRef(ICA.getArgs());
+        if (!ICA.isTypeBasedOnly())


Why do we need isTypeBasedOnly? If it is type-based only, what's the harm in dropping front/back the args?

If it's typeBasedOnly then the args will be empty so I think dropping the front and back will trap

artagnon · 2025-07-09T22:05:26Z

llvm/test/Analysis/CostModel/RISCV/vp-intrinsics.ll

I have a very basic question: why does the arg-based cost differ from the type-based cost for vp.is.fpclass? What information is present in the arguments over and above their types?

IIUC the type based costing is needed for places like the LoopVectorizer where we need to cost things that don't yet have Values materialised, e.g. VPRecipeBase::computeCost.

The value based costing is used by other transforms that actually have a Value at hand, e.g. SLPVectorizer/LoopUnroll/VectorCombine etc.

And for some types of intrinsics and instructions, knowing the value of an operand can actually make the cost more accurate, e.g. if you know the index of a insertelement or the mask of a shufflevector, it can be much cheaper.

A experimental_vp_reverse isn't exactly functionally the same as vector_reverse, so previously it wasn't getting picked up by the generic VP costing code that reuses the non-VP equivalents. But for costing purposes it's good enough so we can reuse it. The RISC-V costs are still incorrect and are showing up as scalarized. llvm#147677 aims to fix part of this.

ElvisWang123

LGTM.

Just for curious, do you know which part of cost for is_fpclass make the type-based different from value-based query?

lukel97 added 2 commits July 9, 2025 17:29

Precommit test

ed70aed

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jul 9, 2025

lukel97 requested review from artagnon, mikhailramalho, arcbbb, ElvisWang123 and wangpc-pp July 9, 2025 09:35

artagnon reviewed Jul 9, 2025

View reviewed changes

undef -> poison

97a562f

artagnon reviewed Jul 9, 2025

View reviewed changes

lukel97 mentioned this pull request Jul 10, 2025

[TTI] Handle experimental.vp.reverse in BasicTTIImpl #147868

Open

ElvisWang123 approved these changes Jul 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent #147677

[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent #147677

lukel97 commented Jul 9, 2025

Uh oh!

llvmbot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

artagnon left a comment

Uh oh!

artagnon left a comment

Uh oh!

artagnon Jul 9, 2025

Uh oh!

lukel97 Jul 10, 2025

Uh oh!

artagnon Jul 9, 2025

Uh oh!

lukel97 Jul 10, 2025

Uh oh!

artagnon Jul 9, 2025

Uh oh!

lukel97 Jul 10, 2025

Uh oh!

ElvisWang123 left a comment

Uh oh!

Uh oh!

[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent #147677

Are you sure you want to change the base?

[TTI] Don't drop VP intrinsic args when delegating to non-vp equivalent #147677

Conversation

lukel97 commented Jul 9, 2025

Uh oh!

llvmbot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

artagnon left a comment

Choose a reason for hiding this comment

Uh oh!

artagnon Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

artagnon Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

artagnon Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jul 9, 2025 •

edited

Loading