Skip to content

[AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates #134016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 75 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
91eeaf0
Add the functional identity and feature queries.
AlexVlx Apr 2, 2025
8bf1168
Fix format.
AlexVlx Apr 2, 2025
3421292
Fix broken patch merge.
AlexVlx Apr 2, 2025
539c7e6
Add release notes.
AlexVlx Apr 2, 2025
5926b9f
(Hopefully) Final format fix.
AlexVlx Apr 2, 2025
4381d93
Remove stray space.
AlexVlx Apr 2, 2025
d18f64e
Remove unused header, fix borked test.
AlexVlx Apr 2, 2025
7880ff4
Stars everywhere.
AlexVlx Apr 2, 2025
719dfde
Fix format without line break.
AlexVlx Apr 2, 2025
36b69b4
Add host tests.
AlexVlx Apr 2, 2025
e327e15
Fit code examples within 80-char limit.
AlexVlx Apr 2, 2025
d35efc5
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Apr 14, 2025
5dee670
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Apr 15, 2025
888a080
Fix tests.
AlexVlx Apr 16, 2025
e35ac62
Fix test.
AlexVlx Apr 16, 2025
6c41ed2
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Apr 16, 2025
8683148
Merge branch 'zcfs' of https://github.com/AlexVlx/llvm-project; branc…
AlexVlx Apr 22, 2025
a9b3e85
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Apr 24, 2025
468a517
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Apr 29, 2025
1b8b57e
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 2, 2025
18b4af2
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 5, 2025
a8bca2f
Re-work implementation to return a target specific type.
AlexVlx May 6, 2025
716cc1f
Fix formatting.
AlexVlx May 6, 2025
79035a9
Delete spurious whitespace.
AlexVlx May 6, 2025
6945c2e
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 7, 2025
9a7e250
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 7, 2025
0f04dbc
Handle jumps into controlled sequences.
AlexVlx May 7, 2025
39a9d55
Fix formatting.
AlexVlx May 7, 2025
3fe116e
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 8, 2025
49c862a
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 12, 2025
f293f39
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 16, 2025
251476d
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 16, 2025
ebde49b
Start incorporating review feedback.
AlexVlx May 16, 2025
4bdd30e
Less `auto`.
AlexVlx May 16, 2025
a1b4a11
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 20, 2025
18841c1
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 22, 2025
76848d5
Print out valid AMDGCN processor identifiers.
AlexVlx May 22, 2025
e1bfdf3
Use boolean type for the predicate, even though it should never get e…
AlexVlx May 22, 2025
4f65468
Register pass early.
AlexVlx May 22, 2025
e940d42
Clarify builtins are also available in C.
AlexVlx May 22, 2025
46adb74
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx May 26, 2025
ca9521d
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 2, 2025
11dd570
Try to fix potentially erroneous indentation in note.
AlexVlx Jun 2, 2025
fab0d14
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 2, 2025
611ec0d
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 2, 2025
03b029f
Add test for returning a predicate.
AlexVlx Jun 2, 2025
32be1c0
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 2, 2025
012f74d
Fix formatting.
AlexVlx Jun 2, 2025
cd7c920
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 10, 2025
33bbe35
Add predicate expansion pass to LTO pipeline.
AlexVlx Jun 10, 2025
46e5a91
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 10, 2025
8c1e1f5
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 13, 2025
81a55d8
Try to fix odd but persistent doc generation error.
AlexVlx Jun 13, 2025
bbe17fa
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 17, 2025
c495630
Adopt suggestions.
AlexVlx Jun 18, 2025
420a19c
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 18, 2025
dc0221e
Implement some of the review suggestions.
AlexVlx Jun 23, 2025
7f88eb7
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jun 24, 2025
3b727b9
Clean up unreachable BBs.
AlexVlx Jun 24, 2025
42f5de6
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 3, 2025
246ff38
Fix formatting.
AlexVlx Jul 3, 2025
adb6469
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 3, 2025
6b368d5
Remove internal functions made unreachable by predicate expansion.
AlexVlx Jul 4, 2025
435ce05
Fix formatting, tweak use count.
AlexVlx Jul 4, 2025
2c2f78b
Fix formatting, again.
AlexVlx Jul 4, 2025
09115e2
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 6, 2025
f1d41a8
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 7, 2025
5d2c2f6
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 8, 2025
05033ea
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 8, 2025
b886c55
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 9, 2025
63d059d
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 9, 2025
495b567
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 10, 2025
8ef5c70
Merge branch 'main' of https://github.com/llvm/llvm-project into zcfs
AlexVlx Jul 10, 2025
8177876
Add warnings around unguarded builtin usage, suggesting `__builtin_am…
AlexVlx Jul 12, 2025
b4decc2
Fix formatting.
AlexVlx Jul 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions clang/docs/LanguageExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced.
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")

__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
a functional mechanism for programatically querying:

* the identity of the current target processor;
* the capability of the current target processor to invoke a particular builtin.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @sarnex @jhuber6 as this relates to __has_builtin behavior somewhat and we've been in discussions about whether that should mean "I know about the builtin" or "I can actually call the builtin".


**Syntax**:

.. code-block:: c

// When used as the predicate for a control structure
bool __builtin_amdgcn_processor_is(const char*);
bool __builtin_amdgcn_is_invocable(builtin_name);
// Otherwise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use case for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an use case for the “Otherwise” scenario? No, it merely ensures that other uses are broken since there’s no void-to-anything conversion possible. Apologies if you are asking a different question and I misunderstood.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean as "as the predicate for a control structure" here for there to be an otherwise. Why not just make this a diagnostic? OR have a reasonable value in those cases? Also, having the argument type change as well as the return value for __builtin_amdgcn_is_invocable is pretty novel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-answering my own question: Later on you seem to be using 'predicate for a control structure' to mean 'the condition of a if/while/for. However, why is it problematic to have someone check this and store it in a variable? Why is:

if (__builtin_amd_gcn_processor_is("gfx1201")) fine, but:
bool b = __builtin_amd_gcn_processor_is("gfx1201"); if (b) a problem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what is the type in an unevaluated context? So what is decltype(__builtin_amd_gcn_processor_is("gfx1201"))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type, when "observable", is always void. So e.g. decltype(__builtin_amdgcn_processor_is(...)), sizeof(__builtin_amdgcn_processor_is(...)), auto x = __builtin_amdgcn_processor_is(...); decltype(x) would always be void / errors. I will pick up the other two Qs in a more thorough reply.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about when that context is inside of an if?

Either way, I'm pretty against the void return type change part of this design. It seems like a poor design at that point. @AaronBallman can comment if he'd like, but I suspect he agrees with me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not comfortable with a signature that changes depending on context. That's a pretty novel design.

void __builtin_amdgcn_processor_is(const char*);
void __builtin_amdgcn_is_invocable(void);

**Example of use**:

.. code-block:: c++

if (__builtin_amdgcn_processor_is("gfx1201") ||
__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
__builtin_amdgcn_s_sleep_var(x);

if (!__builtin_amdgcn_processor_is("gfx906"))
__builtin_amdgcn_s_wait_event_export_ready();
else if (__builtin_amdgcn_processor_is("gfx1010") ||
__builtin_amdgcn_processor_is("gfx1101"))
__builtin_amdgcn_s_ttracedata_imm(1);

while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;

do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the formatting OK on this line? Github is making this look awful weird.

Also-also: This is an infinite loop, right? As you said this is never evaluated at runtime, the answer would be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would indeed be an infinite loop iff the concrete target is gfx1010, where it'd expand into do { *p -= x; } while (true). Otherwise, it's a one trip loop. I've put these in for illustration only, I will tidy the tests though to ensure no infinite loops persist, thank you for pointing it out.


for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about inf. loop here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this'd be an infinite loop ever, it's either 0 trips or one trip, if the predicate is true we just break. Otherwise, the discussion from the previous inf loop applies.


if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
__builtin_amdgcn_s_wait_event_export_ready();
else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
__builtin_amdgcn_s_ttracedata_imm(1);

do {
*p -= x;
} while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting on this one is weird too. Also, same question again, and again on 4972.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my previous replies, hopefully they clarify matters.


for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break;

**Description**:

When used as the predicate value of the following control structures:

.. code-block:: c++

if (...)
while (...)
do { } while (...)
for (...)

be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about when used as an initializer in one of those? Consider:

if (auto b = __builtin_amd_gcn_processor_is("gfx1201"); b && another_condition)

builtins return a boolean value that:

* indicates whether the current target matches the argument; the argument MUST
be a string literal and a valid AMDGPU target
* indicates whether the builtin function passed as the argument can be invoked
by the current target; the argument MUST be either a generic or AMDGPU
specific builtin name

Outside of these contexts, the builtins have a ``void`` returning signature
which prevents their misuse.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What misuse here? If you want to catch other situations, just diagnose. But it seems that these values could be useful/possible outside of condition expressions.


**Example of invalid use**:

.. code-block:: c++

void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
else if (__builtin_amdgcn_processor_is(str)) __builtin_trap();

bool a = __builtin_amdgcn_processor_is("gfx906");
const bool b = !__builtin_amdgcn_processor_is("gfx906");
const bool c = !__builtin_amdgcn_processor_is("gfx906");
bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
const auto f =
!__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
|| __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
const auto g =
!__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)
|| !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var);
__builtin_amdgcn_processor_is("gfx1201")
? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42);
if (pfn(__builtin_amdgcn_processor_is("gfx1200")))
__builtin_amdgcn_s_sleep_var(x);

if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return;
else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap();
}

When invoked while compiling for a concrete target, the builtins are evaluated
early by Clang, and never produce any CodeGen effects / have no observable
side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v,
which is an abstract target, a series of predicate values are implicitly
created. These predicates get resolved when finalizing the compilation process
for a concrete target, and shall reflect the latter's identity and features.
Thus, it is possible to author high-level code, in e.g. HIP, that is target
adaptive in a dynamic fashion, contrary to macro based mechanisms.

ARM/AArch64 Language Extensions
-------------------------------
Expand Down
5 changes: 5 additions & 0 deletions clang/include/clang/Basic/BuiltinsAMDGPU.def
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")

// These are special FE only builtins intended for forwarding the requirements
// to the ME.
BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu")
BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu")

//===----------------------------------------------------------------------===//
// R600-NI only builtins.
//===----------------------------------------------------------------------===//
Expand Down
10 changes: 10 additions & 0 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine
// AMDGCN builtins diagnostics
def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">;
def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">;
def err_amdgcn_processor_is_arg_not_literal
: Error<"the argument to __builtin_amdgcn_processor_is must be a string "
"literal">;
def err_amdgcn_processor_is_arg_invalid_value
: Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
"AMDGCN processor identifier; '%0' is not valid">;
def err_amdgcn_is_invocable_arg_invalid_value
: Error<"the argument to __builtin_amdgcn_is_invocable must be either a "
"target agnostic builtin or an AMDGCN target specific builtin; `%0`"
" is not valid">;
} // end of sema component.
4 changes: 4 additions & 0 deletions clang/lib/Basic/Targets/SPIR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
Float128Format = DoubleFormat;
}
}

bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const {
return AMDGPUTI.isValidCPUName(CPU);
}
4 changes: 4 additions & 0 deletions clang/lib/Basic/Targets/SPIR.h
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final
}

bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }

// This is only needed for validating arguments passed to
// __builtin_amdgcn_processor_is
bool isValidCPUName(StringRef Name) const override;
};

} // namespace targets
Expand Down
29 changes: 29 additions & 0 deletions clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}

static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());

auto P = cast<GlobalVariable>(
CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy));
P->setConstant(true);
P->setExternallyInitialized(true);

return CGF.Builder.CreateLoad(
RawAddress(P, PTy, CharUnits::One(), KnownNonNull));
}

Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
Expand Down Expand Up @@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
llvm::Value *Env = EmitScalarExpr(E->getArg(0));
return Builder.CreateCall(F, {Env});
}
case AMDGPU::BI__builtin_amdgcn_processor_is: {
assert(CGM.getTriple().isSPIRV() &&
"__builtin_amdgcn_processor_is should never reach CodeGen for "
"concrete targets!");
StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString();
return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc);
}
case AMDGPU::BI__builtin_amdgcn_is_invocable: {
assert(CGM.getTriple().isSPIRV() &&
"__builtin_amdgcn_is_invocable should never reach CodeGen for "
"concrete targets!");
auto FD = cast<FunctionDecl>(
cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee());
StringRef RF =
getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID());
return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF);
}
case AMDGPU::BI__builtin_amdgcn_read_exec:
return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false);
case AMDGPU::BI__builtin_amdgcn_read_exec_lo:
Expand Down
Loading
Loading