Skip to content

Commit dc37699

Browse files
authored
[SYCLLowerIR] Remove !amdgcn.annotations metadata (#14713)
The `!amdgcn.annotations` metadata was a SYCL-specific addition. The concept of annotations for AMDGPU makes it appear as if it's a mirror of NVVM annotations, when in fact it's just a kernel tagging mechanism. It is not a feature supported by AMD's drivers. We don't need to rely on this, as the functions' calling conventions identify kernels. We also rely on the "sycl-device" module flag to restrict the passes to SYCL code. This patch re-uses the existing `TargetHelpers` namespace to hide the target-specific logic behind a new class: the `KernelCache`. This provides a way of maintaining a cache of kernels, with optional annotation metadata (it could be expanded in the future with more types of payload). It also provides abstracted ways of handling certain RAUW operations on kernels, though currently only a minimum required to support the two existing patterns. The aim of this is to hide all concept of "annotations" from the passes, and make it an implementation detail of the `KernelCache`. During this work, it was noticed that our handling of annotations was incomplete. NVVM annotations are not required to only only have 3 operands, as the official documentation shows. It's actually a list of pairs, any one of which may declare the function a kernel. Thus we may have missed valid kernels. Tests have been added to check for this. The `GlobalOffset` pass was also treating "unsupported" architectures as AMDGPU architectures, so that has been tightened up and the tests have been updated to ensure they actually register as AMD modules. LIT tests have been cleaned up somewhat, to remove unnecessary features like comments and function linkage types. Several LIT tests have been converted to use the `update_test_checks.py` or `update_llc_test_checks.py` scripts, where appropriate. These tools cannot currently emit checks for named metadata nor certain assembly features, so some tests must remain as they are.
1 parent 10b3727 commit dc37699

File tree

44 files changed

+1000
-789
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1000
-789
lines changed

clang/lib/CodeGen/Targets/AMDGPU.cpp

Lines changed: 0 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -317,12 +317,6 @@ class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
317317
bool shouldEmitStaticExternCAliases() const override;
318318
bool shouldEmitDWARFBitFieldSeparators() const override;
319319
void setCUDAKernelCallingConvention(const FunctionType *&FT) const override;
320-
321-
private:
322-
// Adds a NamedMDNode with GV, Name, and Operand as operands, and adds the
323-
// resulting MDNode to the amdgcn.annotations MDNode.
324-
static void addAMDGCNMetadata(llvm::GlobalValue *GV, StringRef Name,
325-
int Operand);
326320
};
327321
}
328322

@@ -404,33 +398,6 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
404398
}
405399
}
406400

407-
/// Helper function for AMDGCN and NVVM targets, adds a NamedMDNode with GV,
408-
/// Name, and Operand as operands, and adds the resulting MDNode to the
409-
/// AnnotationName MDNode.
410-
static void addAMDGCOrNVVMMetadata(const char *AnnotationName,
411-
llvm::GlobalValue *GV, StringRef Name,
412-
int Operand) {
413-
llvm::Module *M = GV->getParent();
414-
llvm::LLVMContext &Ctx = M->getContext();
415-
416-
// Get annotations metadata node.
417-
llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata(AnnotationName);
418-
419-
llvm::Metadata *MDVals[] = {
420-
llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name),
421-
llvm::ConstantAsMetadata::get(
422-
llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), Operand))};
423-
// Append metadata to annotations node.
424-
MD->addOperand(llvm::MDNode::get(Ctx, MDVals));
425-
}
426-
427-
428-
void AMDGPUTargetCodeGenInfo::addAMDGCNMetadata(llvm::GlobalValue *GV,
429-
StringRef Name, int Operand) {
430-
addAMDGCOrNVVMMetadata("amdgcn.annotations", GV, Name, Operand);
431-
}
432-
433-
434401
/// Emits control constants used to change per-architecture behaviour in the
435402
/// AMDGPU ROCm device libraries.
436403
void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
@@ -483,12 +450,6 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
483450
if (FD)
484451
setFunctionDeclAttributes(FD, F, M);
485452

486-
// Create !{<func-ref>, metadata !"kernel", i32 1} node for SYCL kernels.
487-
const bool IsSYCLKernel =
488-
FD && M.getLangOpts().SYCLIsDevice && FD->hasAttr<SYCLKernelAttr>();
489-
if (IsSYCLKernel)
490-
addAMDGCNMetadata(F, "kernel", 1);
491-
492453
if (M.getContext().getTargetInfo().allowAMDGPUUnsafeFPAtomics())
493454
F->addFnAttr("amdgpu-unsafe-fp-atomics", "true");
494455

clang/test/CodeGenSYCL/kernel-annotation.cpp

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,12 @@ class Functor {
2323
};
2424

2525
// CHECK-SPIR-NOT: annotations =
26+
// CHECK-AMDGCN-NOT: annotations =
2627

2728
// CHECK-NVPTX: nvvm.annotations = !{[[FIRST:![0-9]]], [[SECOND:![0-9]]]}
2829
// CHECK-NVPTX: [[FIRST]] = !{ptr @_ZTS7Functor, !"kernel", i32 1}
2930
// CHECK-NVPTX: [[SECOND]] = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE0_clES2_E5foo_2, !"kernel", i32 1}
3031

31-
// CHECK-AMDGCN: amdgcn.annotations = !{[[FIRST:![0-9]]], [[SECOND:![0-9]]]}
32-
// CHECK-AMDGCN: [[FIRST]] = !{ptr @_ZTS7Functor, !"kernel", i32 1}
33-
// CHECK-AMDGCN: [[SECOND]] = !{ptr @_ZTSZZ4mainENKUlRN4sycl3_V17handlerEE0_clES2_E5foo_2, !"kernel", i32 1}
34-
3532
int main() {
3633
sycl::queue q;
3734
q.submit([&](sycl::handler &cgh) {

llvm/docs/AMDGPUUsage.rst

Lines changed: 0 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -15847,33 +15847,6 @@ track the usage for each kernel. However, in some cases careful organization of
1584715847
the kernels and functions in the source file means there is minimal additional
1584815848
effort required to accurately calculate GPR usage.
1584915849

15850-
SYCL Kernel Metadata
15851-
====================
15852-
15853-
This section describes the additional metadata that is inserted for SYCL
15854-
kernels. As SYCL is a single source programming model functions can either
15855-
execute on a host or a device (i.e. GPU). Device kernels are akin to kernel
15856-
entry-points in GPU program. To mark an LLVM IR function as a device kernel
15857-
function, we make use of special LLVM metadata. The AMDGCN back-end will look
15858-
for a named metadata node called ``amdgcn.annotations``. This named metadata
15859-
must contain a list of metadata that describe the kernel IR. For our purposes,
15860-
we need to declare a metadata node that assigns the `"kernel"` attribute to the
15861-
LLVM IR function that should be emitted as a SYCL kernel function. These
15862-
metadata nodes take the form:
15863-
15864-
.. code-block:: text
15865-
15866-
!{<function ref>, metadata !"kernel", i32 1}
15867-
15868-
Consider the metadata generated by global-offset pass, showing a void kernel
15869-
function `example_kernel_with_offset` taking one argument, a pointer to 3 i32
15870-
integers:
15871-
15872-
.. code-block:: llvm
15873-
15874-
!amdgcn.annotations = !{!0}
15875-
!0 = !{void ([3 x i32]*)* @_ZTS14example_kernel_with_offset, !"kernel", i32 1}
15876-
1587715850
Additional Documentation
1587815851
========================
1587915852

llvm/include/llvm/SYCLLowerIR/GlobalOffset.h

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,6 @@ class PassRegistry;
2424
/// with an offset parameter which will be threaded through from the kernel
2525
/// entry point.
2626
class GlobalOffsetPass : public PassInfoMixin<GlobalOffsetPass> {
27-
private:
28-
using KernelPayload = TargetHelpers::KernelPayload;
29-
using ArchType = TargetHelpers::ArchType;
30-
3127
public:
3228
explicit GlobalOffsetPass() {}
3329

@@ -41,7 +37,8 @@ class GlobalOffsetPass : public PassInfoMixin<GlobalOffsetPass> {
4137
/// appended to the name.
4238
///
4339
/// \param Func Kernel to be processed.
44-
void processKernelEntryPoint(Function *Func);
40+
void processKernelEntryPoint(Function *Func,
41+
TargetHelpers::KernelCache &KCache);
4542

4643
/// For a function containing a call instruction to the implicit offset
4744
/// intrinsic, or another function which eventually calls the intrinsic,
@@ -65,7 +62,8 @@ class GlobalOffsetPass : public PassInfoMixin<GlobalOffsetPass> {
6562
/// to have the implicit parameter added to it or be replaced with the
6663
/// implicit parameter.
6764
void addImplicitParameterToCallers(Module &M, Value *Callee,
68-
Function *CalleeWithImplicitParam);
65+
Function *CalleeWithImplicitParam,
66+
TargetHelpers::KernelCache &KCache);
6967

7068
/// For a given function `Func` create a clone and extend its signature to
7169
/// contain an implicit offset argument.
@@ -89,18 +87,6 @@ class GlobalOffsetPass : public PassInfoMixin<GlobalOffsetPass> {
8987
Type *ImplicitArgumentType = nullptr,
9088
bool KeepOriginal = false, bool IsKernel = false);
9189

92-
/// Create a mapping of kernel entry points to their metadata nodes. While
93-
/// iterating over kernels make sure that a given kernel entry point has no
94-
/// llvm uses.
95-
///
96-
/// \param KernelPayloads A collection of kernel functions present in a
97-
/// module `M`.
98-
///
99-
/// \returns A map of kernel functions to corresponding metadata nodes.
100-
DenseMap<Function *, MDNode *>
101-
generateKernelMDNodeMap(Module &M,
102-
SmallVectorImpl<KernelPayload> &KernelPayloads);
103-
10490
private:
10591
/// Keep track of all cloned offset functions to avoid processing them.
10692
llvm::SmallPtrSet<Function *, 8> Clones;
@@ -109,14 +95,11 @@ class GlobalOffsetPass : public PassInfoMixin<GlobalOffsetPass> {
10995
/// Keep track of which non-offset functions have been processed to avoid
11096
/// processing twice.
11197
llvm::DenseMap<Function *, Value *> ProcessedFunctions;
112-
/// Keep a map of all entry point functions with metadata.
113-
llvm::DenseMap<Function *, MDNode *> EntryPointMetadata;
11498
/// A type of implicit argument added to the kernel signature.
11599
llvm::Type *KernelImplicitArgumentType = nullptr;
116100
/// A type used for the alloca holding the values of global offsets.
117101
llvm::Type *ImplicitOffsetPtrType = nullptr;
118102

119-
ArchType AT;
120103
unsigned TargetAS = 0;
121104
};
122105

llvm/include/llvm/SYCLLowerIR/LocalAccessorToSharedMemory.h

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@ class PassRegistry;
2525
/// functions.
2626
class LocalAccessorToSharedMemoryPass
2727
: public PassInfoMixin<LocalAccessorToSharedMemoryPass> {
28-
private:
29-
using KernelPayload = TargetHelpers::KernelPayload;
30-
using ArchType = TargetHelpers::ArchType;
31-
3228
public:
3329
explicit LocalAccessorToSharedMemoryPass() {}
3430

@@ -49,12 +45,6 @@ class LocalAccessorToSharedMemoryPass
4945
/// \returns A new function with global symbol accesses.
5046
Function *processKernel(Module &M, Function *F);
5147

52-
/// Update kernel metadata to reflect the change in the signature.
53-
///
54-
/// \param A map of original kernels to the modified ones.
55-
void postProcessKernels(
56-
SmallVectorImpl<std::pair<Function *, KernelPayload>> &NewToOldKernels);
57-
5848
private:
5949
/// The value for NVVM's ADDRESS_SPACE_SHARED and AMD's LOCAL_ADDRESS happen
6050
/// to be 3.

llvm/include/llvm/SYCLLowerIR/TargetHelpers.h

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,21 +22,54 @@ using namespace llvm;
2222
namespace llvm {
2323
namespace TargetHelpers {
2424

25-
enum class ArchType { Cuda, AMDHSA, Unsupported };
25+
struct KernelCache {
26+
void populateKernels(Module &M);
2627

27-
struct KernelPayload {
28-
KernelPayload(Function *Kernel, MDNode *MD = nullptr);
29-
Function *Kernel;
30-
MDNode *MD;
31-
SmallVector<MDNode *> DependentMDs;
32-
};
28+
bool isKernel(Function &F) const;
29+
30+
/// Updates cached data with a function intended as a replacement of an
31+
/// existing function.
32+
void handleReplacedWith(Function &OldF, Function &NewF);
33+
34+
/// Updates cached data with a new clone of an existing function.
35+
/// The KernelOnly parameter updates cached data with only the information
36+
/// required to identify the new function as a kernel.
37+
void handleNewCloneOf(Function &OldF, Function &NewF, bool KernelOnly);
38+
39+
private:
40+
/// Extra data about a kernel function. Only applicable to NVPTX kernels,
41+
/// which have associated annotation metadata.
42+
struct KernelPayload {
43+
explicit KernelPayload() = default;
44+
KernelPayload(NamedMDNode *ModuleAnnotationsMD);
45+
46+
bool hasAnnotations() const { return ModuleAnnotationsMD != nullptr; }
3347

34-
ArchType getArchType(const Module &M);
48+
/// ModuleAnnotationsMD - metadata conntaining the unique global list of
49+
/// annotations.
50+
NamedMDNode *ModuleAnnotationsMD = nullptr;
51+
SmallVector<MDNode *> DependentMDs;
52+
};
3553

36-
std::string getAnnotationString(ArchType AT);
54+
/// List of kernels in original Module order
55+
SmallVector<Function *, 4> Kernels;
56+
/// Map of kernels to extra data. Also serves as a quick kernel query.
57+
SmallDenseMap<Function *, KernelPayload> KernelData;
58+
59+
public:
60+
using iterator = decltype(Kernels)::iterator;
61+
using const_iterator = decltype(Kernels)::const_iterator;
62+
63+
iterator begin() { return Kernels.begin(); }
64+
iterator end() { return Kernels.end(); }
65+
66+
const_iterator begin() const { return Kernels.begin(); }
67+
const_iterator end() const { return Kernels.end(); }
68+
69+
bool empty() const { return Kernels.empty(); }
70+
};
3771

38-
void populateKernels(Module &M, SmallVectorImpl<KernelPayload> &Kernels,
39-
TargetHelpers::ArchType AT);
72+
bool isSYCLDevice(const Module &M);
4073

4174
} // end namespace TargetHelpers
4275
} // end namespace llvm

0 commit comments

Comments
 (0)