Skip to content

Commit 7cb5a82

Browse files
authored
Merge pull request llvm#610 from AMD-Lightning-Internal/upstream_merge_202502111511
merge main into amd-staging
2 parents eaa9a12 + 3b73d77 commit 7cb5a82

File tree

183 files changed

+7926
-1381
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

183 files changed

+7926
-1381
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@
131131
/bolt/ @aaupov @maksfb @rafaelauler @ayermolo @dcci @yota9
132132

133133
# Bazel build system.
134-
/utils/bazel/ @rupprecht @keith
134+
/utils/bazel/ @rupprecht @keith @aaronmondal
135135

136136
# InstallAPI and TextAPI
137137
/llvm/**/TextAPI/ @cyndyishida

clang/bindings/python/clang/cindex.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1410,6 +1410,9 @@ def is_unexposed(self):
14101410
# OpenMP scope directive.
14111411
OMP_SCOPE_DIRECTIVE = 306
14121412

1413+
# OpenMP stripe directive.
1414+
OMP_STRIPE_DIRECTIVE = 310
1415+
14131416
# OpenACC Compute Construct.
14141417
OPEN_ACC_COMPUTE_DIRECTIVE = 320
14151418

clang/docs/OpenMPSupport.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,8 @@ implementation.
374374
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
375375
| Loop transformation constructs | :none:`unclaimed` | :none:`unclaimed` | |
376376
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
377+
| loop stripe transformation | :good:`done` | https://github.com/llvm/llvm-project/pull/119891 |
378+
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
377379
| work distribute construct | :none:`unclaimed` | :none:`unclaimed` | |
378380
+-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
379381
| task_iteration | :none:`unclaimed` | :none:`unclaimed` | |

clang/docs/ReleaseNotes.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,10 @@ Non-comprehensive list of changes in this release
104104
New Compiler Flags
105105
------------------
106106

107+
- New option ``-fprofile-continuous`` added to enable continuous profile syncing to file (#GH124353, `docs <https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-continuous>`_).
108+
The feature has `existed <https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program>`_)
109+
for a while and this is just a user facing option.
110+
107111
Deprecated Compiler Flags
108112
-------------------------
109113

@@ -129,6 +133,8 @@ Improvements to Clang's diagnostics
129133
which are supposed to only exist once per program, but may get duplicated when
130134
built into a shared library.
131135
- Fixed a bug where Clang's Analysis did not correctly model the destructor behavior of ``union`` members (#GH119415).
136+
- A statement attribute applied to a ``case`` label no longer suppresses
137+
'bypassing variable initialization' diagnostics (#84072).
132138

133139
Improvements to Clang's time-trace
134140
----------------------------------
@@ -285,6 +291,7 @@ Python Binding Changes
285291
OpenMP Support
286292
--------------
287293
- Added support 'no_openmp_constructs' assumption clause.
294+
- Added support for 'omp stripe' directive.
288295

289296
Improvements
290297
^^^^^^^^^^^^

clang/docs/TypeSanitizer.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ reduce these impacts.
2727
The TypeSanitizer Algorithm
2828
===========================
2929
For each TBAA type-access descriptor, encoded in LLVM IR using TBAA Metadata, the instrumentation
30-
pass generates descriptor tales. Thus there is a unique pointer to each type (and access descriptor).
30+
pass generates descriptor tables. Thus there is a unique pointer to each type (and access descriptor).
3131
These tables are comdat (except for anonymous-namespace types), so the pointer values are unique
3232
across the program.
3333

clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst

Lines changed: 93 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ Performance Investigation
55
Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
66
A translation unit contains multiple entry points, each of which take multiple steps to analyze.
77

8+
Performance analysis using ``-ftime-trace``
9+
===========================================
10+
811
You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
912
You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
1013
or using `speedscope <https://speedscope.app>`_.
@@ -19,9 +22,8 @@ Here is an example of a time trace produced with
1922
.. code-block:: bash
2023
:caption: Clang Static Analyzer invocation to generate a time trace of string.c analysis.
2124
22-
clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
23-
-setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
24-
-verify ./clang/test/Analysis/string.c \
25+
clang -cc1 -analyze -verify clang/test/Analysis/string.c \
26+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
2527
-ftime-trace=trace.json -ftime-trace-granularity=1
2628
2729
.. image:: ../images/speedscope.png
@@ -45,3 +47,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
4547
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
4648
for a single entry point.
4749
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
50+
51+
52+
Performance analysis using ``perf``
53+
===================================
54+
55+
`Perf <https://perfwiki.github.io/main/>`_ is a tool for conducting sampling-based profiling.
56+
It's easy to start profiling, you only have 2 prerequisites.
57+
Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
58+
You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
59+
along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
60+
Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
61+
62+
.. code-block:: bash
63+
:caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
64+
65+
# -F: Sampling frequency, use `-F max` for maximal frequency
66+
# -g: Enable call-graph recording for both kernel and user space
67+
perf record -F 99 -g -- clang -cc1 -analyze -verify clang/test/Analysis/string.c \
68+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
69+
70+
Once you have the profile data, you can use it to produce a Flame graph.
71+
A Flame graph is a visual representation of the stack frames of the samples.
72+
Common stack frame prefixes are squashed together, making up a wider bar.
73+
The wider the bar, the more time was spent under that particular stack frame,
74+
giving a sense of how the overall execution time was spent.
75+
76+
Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
77+
as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
78+
It's also useful to check out Brendan Gregg's (the author of FlameGraph)
79+
`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
80+
81+
82+
.. code-block:: bash
83+
:caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
84+
85+
perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
86+
/path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
87+
firefox perf.svg
88+
89+
.. image:: ../images/flamegraph.png
90+
91+
92+
Performance analysis using ``uftrace``
93+
======================================
94+
95+
`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
96+
that you can use to focus and drill down into the timeline of your application.
97+
We will use it to generate Chromium trace JSON.
98+
In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and thorough than the sampling-based approaches like ``perf``.
99+
In contrast to using ``-ftime-trace``, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
100+
All functions are profiled due to automatic static instrumentation.
101+
102+
There is only one prerequisite to use this tool.
103+
You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
104+
This will make it run substantially slower but allows rich instrumentation.
105+
It will also consume many gigabites of storage for a single trace unless filter flags are used during recording.
106+
107+
.. code-block:: bash
108+
:caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
109+
110+
uftrace record clang -cc1 -analyze -verify clang/test/Analysis/string.c \
111+
-analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection
112+
uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
113+
114+
.. image:: ../images/uftrace_detailed.png
115+
116+
In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
117+
You can also see how deep function calls we may have due to AST visitors.
118+
119+
Using different filters can reduce the number of functions to record.
120+
For the common options, refer to the ``uftrace`` `documentation <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_.
121+
122+
Similar filters can be applied for dumping too. That way you can reuse the same (detailed)
123+
recording to selectively focus on some special part using a refinement of the filter flags.
124+
Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
125+
thus it needs to be of a limited size.
126+
If you do not apply filters on recording, you will collect a large trace and every dump operation
127+
would need to sieve through the much larger recording which may be annoying if done repeatedly.
128+
129+
If the trace JSON is still too large to load, have a look at the dump as plain text and look for frequent entries that refer to non-interesting parts.
130+
Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
131+
To see what functions appear frequently in the trace, use this command:
132+
133+
.. code-block:: bash
134+
135+
cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
136+
137+
``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
72.6 KB
Loading
59.4 KB
Loading

clang/include/clang-c/Index.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2158,6 +2158,10 @@ enum CXCursorKind {
21582158
*/
21592159
CXCursor_OMPAssumeDirective = 309,
21602160

2161+
/** OpenMP assume directive.
2162+
*/
2163+
CXCursor_OMPStripeDirective = 310,
2164+
21612165
/** OpenACC Compute Construct.
21622166
*/
21632167
CXCursor_OpenACCComputeConstruct = 320,

clang/include/clang/AST/ASTContext.h

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1733,6 +1733,47 @@ class ASTContext : public RefCountedBase<ASTContext> {
17331733
unsigned NumPositiveBits, QualType &BestType,
17341734
QualType &BestPromotionType);
17351735

1736+
/// Determine whether the given integral value is representable within
1737+
/// the given type T.
1738+
bool isRepresentableIntegerValue(llvm::APSInt &Value, QualType T);
1739+
1740+
/// Compute NumNegativeBits and NumPositiveBits for an enum based on
1741+
/// the constant values of its enumerators.
1742+
template <typename RangeT>
1743+
bool computeEnumBits(RangeT EnumConstants, unsigned &NumNegativeBits,
1744+
unsigned &NumPositiveBits) {
1745+
NumNegativeBits = 0;
1746+
NumPositiveBits = 0;
1747+
bool MembersRepresentableByInt = true;
1748+
for (auto *Elem : EnumConstants) {
1749+
EnumConstantDecl *ECD = cast_or_null<EnumConstantDecl>(Elem);
1750+
if (!ECD)
1751+
continue; // Already issued a diagnostic.
1752+
1753+
llvm::APSInt InitVal = ECD->getInitVal();
1754+
if (InitVal.isUnsigned() || InitVal.isNonNegative()) {
1755+
// If the enumerator is zero that should still be counted as a positive
1756+
// bit since we need a bit to store the value zero.
1757+
unsigned ActiveBits = InitVal.getActiveBits();
1758+
NumPositiveBits = std::max({NumPositiveBits, ActiveBits, 1u});
1759+
} else {
1760+
NumNegativeBits =
1761+
std::max(NumNegativeBits, (unsigned)InitVal.getSignificantBits());
1762+
}
1763+
1764+
MembersRepresentableByInt &= isRepresentableIntegerValue(InitVal, IntTy);
1765+
}
1766+
1767+
// If we have an empty set of enumerators we still need one bit.
1768+
// From [dcl.enum]p8
1769+
// If the enumerator-list is empty, the values of the enumeration are as if
1770+
// the enumeration had a single enumerator with value 0
1771+
if (!NumPositiveBits && !NumNegativeBits)
1772+
NumPositiveBits = 1;
1773+
1774+
return MembersRepresentableByInt;
1775+
}
1776+
17361777
QualType
17371778
getUnresolvedUsingType(const UnresolvedUsingTypenameDecl *Decl) const;
17381779

0 commit comments

Comments
 (0)