Skip to content

Commit 6a5bb4c

Browse files
[MemProf] Fix handling of recursive edges during func assignment (#129066)
When we need to reclone other callees of a caller node during function assignment due to the creation of a new function clone, we need to skip recursive edges on that caller. We don't want to reclone the callee in that case (which is the caller), which isn't necessary and also isn't correct from a graph update perspective. It resulted in an assertion and in an NDEBUG build caused an infinite loop.
1 parent e8379ea commit 6a5bb4c

File tree

2 files changed

+87
-0
lines changed

2 files changed

+87
-0
lines changed

llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4267,6 +4267,11 @@ bool CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::assignFunctions() {
42674267
// recorded callsite Call.
42684268
if (Callee == Clone || !Callee->hasCall())
42694269
continue;
4270+
// Skip direct recursive calls. We don't need/want to clone the
4271+
// caller node again, and this loop will not behave as expected if
4272+
// we tried.
4273+
if (Callee == CalleeEdge->Caller)
4274+
continue;
42704275
ContextNode *NewClone = moveEdgeToNewCalleeClone(CalleeEdge);
42714276
removeNoneTypeCalleeEdges(NewClone);
42724277
// Moving the edge may have resulted in some none type
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
;; Test context disambiguation for a callgraph containing multiple memprof
2+
;; contexts with recursion, where we need to perform additional cloning
3+
;; during function assignment/cloning to handle the combination of contexts
4+
;; to 2 different allocations. In particular, ensures that the recursive edges
5+
;; are handled correctly during the function assignment cloning, where they
6+
;; were previously causing an assert (and infinite recursion in an NDEBUG
7+
;; compile).
8+
;;
9+
;; This test is a hand modified version of funcassigncloning.ll, where all
10+
;; the calls to new were moved into one function, with several recursive
11+
;; calls for the different contexts. The code as written here is somewhat
12+
;; nonsensical as it would produce infinite recursion, but in a real case
13+
;; the recursive calls would be suitably guarded.
14+
;;
15+
;; For this test we just ensure that it doesn't crash, and gets the expected
16+
;; cloning remarks.
17+
18+
; RUN: opt -passes=memprof-context-disambiguation -supports-hot-cold-new \
19+
; RUN: -memprof-verify-ccg -memprof-verify-nodes \
20+
; RUN: -stats -pass-remarks=memprof-context-disambiguation \
21+
; RUN: %s -S 2>&1 | FileCheck %s --check-prefix=REMARKS
22+
23+
24+
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
25+
target triple = "x86_64-unknown-linux-gnu"
26+
27+
declare ptr @_Znam(i64) #1
28+
29+
define internal void @_Z1DPPcS0_(ptr %0, ptr %1) #3 {
30+
entry:
31+
call void @_Z1DPPcS0_(ptr noundef %0, ptr noundef %1), !callsite !16
32+
call void @_Z1DPPcS0_(ptr noundef %0, ptr noundef %1), !callsite !17
33+
call void @_Z1DPPcS0_(ptr noundef %0, ptr noundef %1), !callsite !18
34+
%call = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !memprof !0, !callsite !7
35+
%call1 = call noalias noundef nonnull ptr @_Znam(i64 noundef 10) #6, !memprof !8, !callsite !15
36+
ret void
37+
}
38+
39+
; uselistorder directives
40+
uselistorder ptr @_Znam, { 1, 0 }
41+
42+
attributes #1 = { "no-trapping-math"="true" }
43+
attributes #3 = { "frame-pointer"="all" }
44+
attributes #6 = { builtin }
45+
46+
!0 = !{!1, !3, !5}
47+
!1 = !{!2, !"cold"}
48+
!2 = !{i64 -3461278137325233666, i64 -7799663586031895603, i64 -7799663586031895603}
49+
!3 = !{!4, !"notcold"}
50+
!4 = !{i64 -3461278137325233666, i64 -3483158674395044949, i64 -3483158674395044949}
51+
!5 = !{!6, !"notcold"}
52+
!6 = !{i64 -3461278137325233666, i64 -2441057035866683071, i64 -2441057035866683071}
53+
!7 = !{i64 -3461278137325233666}
54+
!8 = !{!9, !11, !13}
55+
!9 = !{!10, !"notcold"}
56+
!10 = !{i64 -1415475215210681400, i64 -2441057035866683071, i64 -2441057035866683071}
57+
!11 = !{!12, !"cold"}
58+
!12 = !{i64 -1415475215210681400, i64 -3483158674395044949, i64 -3483158674395044949}
59+
!13 = !{!14, !"notcold"}
60+
!14 = !{i64 -1415475215210681400, i64 -7799663586031895603, i64 -7799663586031895603}
61+
!15 = !{i64 -1415475215210681400}
62+
!16 = !{i64 -2441057035866683071}
63+
!17 = !{i64 -3483158674395044949}
64+
!18 = !{i64 -7799663586031895603}
65+
66+
67+
;; We greedily create a clone that is initially used by the clones of the
68+
;; first call to new. However, we end up with an incompatible set of callers
69+
;; given the second call to new which has clones with a different combination of
70+
;; callers. Eventually, we create 2 more clones, and the first clone becomes dead.
71+
; REMARKS: created clone _Z1DPPcS0_.memprof.1
72+
; REMARKS: created clone _Z1DPPcS0_.memprof.2
73+
; REMARKS: created clone _Z1DPPcS0_.memprof.3
74+
; REMARKS: call in clone _Z1DPPcS0_ assigned to call function clone _Z1DPPcS0_.memprof.2
75+
; REMARKS: call in clone _Z1DPPcS0_.memprof.2 marked with memprof allocation attribute cold
76+
; REMARKS: call in clone _Z1DPPcS0_ assigned to call function clone _Z1DPPcS0_.memprof.3
77+
; REMARKS: call in clone _Z1DPPcS0_.memprof.3 marked with memprof allocation attribute notcold
78+
; REMARKS: call in clone _Z1DPPcS0_ assigned to call function clone _Z1DPPcS0_
79+
; REMARKS: call in clone _Z1DPPcS0_ marked with memprof allocation attribute notcold
80+
; REMARKS: call in clone _Z1DPPcS0_.memprof.2 marked with memprof allocation attribute notcold
81+
; REMARKS: call in clone _Z1DPPcS0_.memprof.3 marked with memprof allocation attribute cold
82+
; REMARKS: call in clone _Z1DPPcS0_ marked with memprof allocation attribute notcold

0 commit comments

Comments
 (0)