Skip to content

Commit 4eac576

Browse files
[RISCV] Add scheduler definitions for SpacemiT-X60 (#137343)
This patch adds an initial scheduler model for the SpacemiT-X60, including latency for scalar instructions only. The scheduler is based on the documented characteristics of the C908, which the SpacemiT-X60 is believed to be based on, and provides the expected latency for several instructions. I ran a probe to confirm all of these values and to get the latency of instructions not provided by the C908 documentation (e.g., double floating-point instructions). For load and store instructions, the C908 documentation says the latency is \>= 3 for load and 1 for store. I tried a few combinations of values until I got the current values of 5 and 3, which yield the best results. Although the X60 does appear to support multiple issue for at least some floating point instructions, this model assumes single issue as increasing it reduces the gains below. This patch gives a geomean improvement of ~4% on SPEC CPU 2017 for both rva22u64 and rva22u64_v, with some benchmarks improving up to 18% (508.namd_r). There were a couple of execution time regressions, but only in noisy benchmarks (523.xalancbmk_r and 510.parest_r). * rva22u64: https://lnt.lukelau.me/db_default/v4/nts/507?compare_to=405 (compares a55f727 to the baseline 8286b80) * rva22u64_v: https://lnt.lukelau.me/db_default/v4/nts/474?compare_to=404 (compares a55f727 to the baseline 8286b80) This initial scheduling model is strongly focused on providing sufficient definitions to provide improved performance for the SpacemiT-X60. Further incremental gains may be possible through a much more detailed microarchitectural analysis, but that is left to future work. Further scheduling definitions for RVV can be added in a future PR.
1 parent c7c1283 commit 4eac576

File tree

7 files changed

+1458
-27
lines changed

7 files changed

+1458
-27
lines changed

llvm/lib/Target/RISCV/RISCV.td

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ include "RISCVSchedSiFive7.td"
5858
include "RISCVSchedSiFiveP400.td"
5959
include "RISCVSchedSiFiveP500.td"
6060
include "RISCVSchedSiFiveP600.td"
61+
include "RISCVSchedSpacemitX60.td"
6162
include "RISCVSchedSyntacoreSCR1.td"
6263
include "RISCVSchedSyntacoreSCR345.td"
6364
include "RISCVSchedSyntacoreSCR7.td"

llvm/lib/Target/RISCV/RISCVProcessors.td

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -608,7 +608,7 @@ def XIANGSHAN_KUNMINGHU : RISCVProcessorModel<"xiangshan-kunminghu",
608608
TuneShiftedZExtWFusion]>;
609609

610610
def SPACEMIT_X60 : RISCVProcessorModel<"spacemit-x60",
611-
NoSchedModel,
611+
SpacemitX60Model,
612612
!listconcat(RVA22S64Features,
613613
[FeatureStdExtV,
614614
FeatureStdExtSscofpmf,
Lines changed: 353 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,353 @@
1+
//=- RISCVSchedSpacemitX60.td - Spacemit X60 Scheduling Defs -*- tablegen -*-=//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
//===----------------------------------------------------------------------===//
10+
//
11+
// Scheduler model for the SpacemiT-X60 processor based on documentation of the
12+
// C908 and experiments on real hardware (bpi-f3).
13+
//
14+
//===----------------------------------------------------------------------===//
15+
16+
def SpacemitX60Model : SchedMachineModel {
17+
let IssueWidth = 2; // dual-issue
18+
let MicroOpBufferSize = 0; // in-order
19+
let LoadLatency = 5; // worse case: >= 3
20+
let MispredictPenalty = 9; // nine-stage
21+
22+
let CompleteModel = 0;
23+
24+
let UnsupportedFeatures = [HasStdExtZknd, HasStdExtZkne, HasStdExtZknh,
25+
HasStdExtZksed, HasStdExtZksh, HasStdExtZkr];
26+
}
27+
28+
let SchedModel = SpacemitX60Model in {
29+
30+
//===----------------------------------------------------------------------===//
31+
// Define processor resources for Spacemit-X60
32+
33+
// Information gathered from the C908 user manual:
34+
let BufferSize = 0 in {
35+
// The LSU supports dual issue for scalar store/load instructions
36+
def SMX60_LS : ProcResource<2>;
37+
38+
// An IEU can decode and issue two instructions at the same time
39+
def SMX60_IEUA : ProcResource<1>;
40+
def SMX60_IEUB : ProcResource<1>;
41+
def SMX60_IEU : ProcResGroup<[SMX60_IEUA, SMX60_IEUB]>;
42+
43+
// Although the X60 does appear to support multiple issue for at least some
44+
// floating point instructions, this model assumes single issue as
45+
// increasing it reduces the gains we saw in performance
46+
def SMX60_FP : ProcResource<1>;
47+
}
48+
49+
//===----------------------------------------------------------------------===//
50+
51+
// Branching
52+
def : WriteRes<WriteJmp, [SMX60_IEUA]>;
53+
def : WriteRes<WriteJal, [SMX60_IEUA]>;
54+
def : WriteRes<WriteJalr, [SMX60_IEUA]>;
55+
56+
// Integer arithmetic and logic
57+
// Latency of ALU instructions is 1, but add.uw is 2
58+
def : WriteRes<WriteIALU32, [SMX60_IEU]>;
59+
def : WriteRes<WriteIALU, [SMX60_IEU]>;
60+
def : WriteRes<WriteShiftImm32, [SMX60_IEU]>;
61+
def : WriteRes<WriteShiftImm, [SMX60_IEU]>;
62+
def : WriteRes<WriteShiftReg32, [SMX60_IEU]>;
63+
def : WriteRes<WriteShiftReg, [SMX60_IEU]>;
64+
65+
// Integer multiplication
66+
def : WriteRes<WriteIMul32, [SMX60_IEU]> { let Latency = 3; }
67+
68+
// The latency of mul is 5, while in mulh, mulhsu, mulhu is 6
69+
// Worst case latency is used
70+
def : WriteRes<WriteIMul, [SMX60_IEU]> { let Latency = 6; }
71+
72+
// Integer division/remainder
73+
// TODO: Latency set based on C908 datasheet and hasn't been
74+
// confirmed experimentally.
75+
let Latency = 12, ReleaseAtCycles = [12] in {
76+
def : WriteRes<WriteIDiv32, [SMX60_IEUA]>;
77+
def : WriteRes<WriteIRem32, [SMX60_IEUA]>;
78+
}
79+
let Latency = 20, ReleaseAtCycles = [20] in {
80+
def : WriteRes<WriteIDiv, [SMX60_IEUA]>;
81+
def : WriteRes<WriteIRem, [SMX60_IEUA]>;
82+
}
83+
84+
// Bitmanip
85+
def : WriteRes<WriteRotateImm, [SMX60_IEU]>;
86+
def : WriteRes<WriteRotateImm32, [SMX60_IEU]>;
87+
def : WriteRes<WriteRotateReg, [SMX60_IEU]>;
88+
def : WriteRes<WriteRotateReg32, [SMX60_IEU]>;
89+
90+
def : WriteRes<WriteCLZ, [SMX60_IEU]>;
91+
def : WriteRes<WriteCLZ32, [SMX60_IEU]>;
92+
def : WriteRes<WriteCTZ, [SMX60_IEU]>;
93+
def : WriteRes<WriteCTZ32, [SMX60_IEU]>;
94+
95+
let Latency = 2 in {
96+
def : WriteRes<WriteCPOP, [SMX60_IEU]>;
97+
def : WriteRes<WriteCPOP32, [SMX60_IEU]>;
98+
}
99+
100+
def : WriteRes<WriteORCB, [SMX60_IEU]>;
101+
def : WriteRes<WriteIMinMax, [SMX60_IEU]>;
102+
def : WriteRes<WriteREV8, [SMX60_IEU]>;
103+
104+
let Latency = 2 in {
105+
def : WriteRes<WriteSHXADD, [SMX60_IEU]>;
106+
def : WriteRes<WriteSHXADD32, [SMX60_IEU]>;
107+
def : WriteRes<WriteCLMUL, [SMX60_IEU]>;
108+
}
109+
110+
// Single-bit instructions
111+
def : WriteRes<WriteSingleBit, [SMX60_IEU]>;
112+
def : WriteRes<WriteSingleBitImm, [SMX60_IEU]>;
113+
def : WriteRes<WriteBEXT, [SMX60_IEU]>;
114+
def : WriteRes<WriteBEXTI, [SMX60_IEU]>;
115+
116+
// Memory/Atomic memory
117+
let Latency = 3 in {
118+
def : WriteRes<WriteSTB, [SMX60_LS]>;
119+
def : WriteRes<WriteSTH, [SMX60_LS]>;
120+
def : WriteRes<WriteSTW, [SMX60_LS]>;
121+
def : WriteRes<WriteSTD, [SMX60_LS]>;
122+
def : WriteRes<WriteFST16, [SMX60_LS]>;
123+
def : WriteRes<WriteFST32, [SMX60_LS]>;
124+
def : WriteRes<WriteFST64, [SMX60_LS]>;
125+
def : WriteRes<WriteAtomicSTW, [SMX60_LS]>;
126+
def : WriteRes<WriteAtomicSTD, [SMX60_LS]>;
127+
}
128+
129+
let Latency = 5 in {
130+
def : WriteRes<WriteLDB, [SMX60_LS]>;
131+
def : WriteRes<WriteLDH, [SMX60_LS]>;
132+
def : WriteRes<WriteLDW, [SMX60_LS]>;
133+
def : WriteRes<WriteLDD, [SMX60_LS]>;
134+
def : WriteRes<WriteFLD16, [SMX60_LS]>;
135+
def : WriteRes<WriteFLD32, [SMX60_LS]>;
136+
def : WriteRes<WriteFLD64, [SMX60_LS]>;
137+
}
138+
139+
// Atomics
140+
let Latency = 5 in {
141+
def : WriteRes<WriteAtomicLDW, [SMX60_LS]>;
142+
def : WriteRes<WriteAtomicLDD, [SMX60_LS]>;
143+
def : WriteRes<WriteAtomicW, [SMX60_LS]>;
144+
def : WriteRes<WriteAtomicD, [SMX60_LS]>;
145+
}
146+
147+
// Floating point units Half precision
148+
let Latency = 4 in {
149+
def : WriteRes<WriteFAdd16, [SMX60_FP]>;
150+
def : WriteRes<WriteFMul16, [SMX60_FP]>;
151+
def : WriteRes<WriteFSGNJ16, [SMX60_FP]>;
152+
def : WriteRes<WriteFMinMax16, [SMX60_FP]>;
153+
}
154+
def : WriteRes<WriteFMA16, [SMX60_FP]> { let Latency = 5; }
155+
156+
let Latency = 12, ReleaseAtCycles = [12] in {
157+
def : WriteRes<WriteFDiv16, [SMX60_FP]>;
158+
def : WriteRes<WriteFSqrt16, [SMX60_FP]>;
159+
}
160+
161+
// Single precision
162+
let Latency = 4 in {
163+
def : WriteRes<WriteFAdd32, [SMX60_FP]>;
164+
def : WriteRes<WriteFMul32, [SMX60_FP]>;
165+
def : WriteRes<WriteFSGNJ32, [SMX60_FP]>;
166+
def : WriteRes<WriteFMinMax32, [SMX60_FP]>;
167+
}
168+
def : WriteRes<WriteFMA32, [SMX60_FP]> { let Latency = 5; }
169+
170+
let Latency = 15, ReleaseAtCycles = [15] in {
171+
def : WriteRes<WriteFDiv32, [SMX60_FP]>;
172+
def : WriteRes<WriteFSqrt32, [SMX60_FP]>;
173+
}
174+
175+
// Double precision
176+
let Latency = 5 in {
177+
def : WriteRes<WriteFAdd64, [SMX60_FP]>;
178+
def : WriteRes<WriteFMul64, [SMX60_FP]>;
179+
def : WriteRes<WriteFSGNJ64, [SMX60_FP]>;
180+
}
181+
def : WriteRes<WriteFMinMax64, [SMX60_FP]> { let Latency = 4; }
182+
def : WriteRes<WriteFMA64, [SMX60_FP]> { let Latency = 6; }
183+
184+
let Latency = 22, ReleaseAtCycles = [22] in {
185+
def : WriteRes<WriteFDiv64, [SMX60_FP]>;
186+
def : WriteRes<WriteFSqrt64, [SMX60_FP]>;
187+
}
188+
189+
// Conversions
190+
let Latency = 6 in {
191+
def : WriteRes<WriteFCvtF16ToI32, [SMX60_IEU]>;
192+
def : WriteRes<WriteFCvtF32ToI32, [SMX60_IEU]>;
193+
def : WriteRes<WriteFCvtF32ToI64, [SMX60_IEU]>;
194+
def : WriteRes<WriteFCvtF64ToI64, [SMX60_IEU]>;
195+
def : WriteRes<WriteFCvtF64ToI32, [SMX60_IEU]>;
196+
def : WriteRes<WriteFCvtF16ToI64, [SMX60_IEU]>;
197+
}
198+
199+
let Latency = 4 in {
200+
def : WriteRes<WriteFCvtI32ToF16, [SMX60_IEU]>;
201+
def : WriteRes<WriteFCvtI32ToF32, [SMX60_IEU]>;
202+
def : WriteRes<WriteFCvtI32ToF64, [SMX60_IEU]>;
203+
def : WriteRes<WriteFCvtI64ToF16, [SMX60_IEU]>;
204+
def : WriteRes<WriteFCvtI64ToF32, [SMX60_IEU]>;
205+
def : WriteRes<WriteFCvtI64ToF64, [SMX60_IEU]>;
206+
def : WriteRes<WriteFCvtF16ToF32, [SMX60_FP]>;
207+
def : WriteRes<WriteFCvtF16ToF64, [SMX60_FP]>;
208+
def : WriteRes<WriteFCvtF32ToF16, [SMX60_FP]>;
209+
def : WriteRes<WriteFCvtF32ToF64, [SMX60_FP]>;
210+
def : WriteRes<WriteFCvtF64ToF16, [SMX60_FP]>;
211+
def : WriteRes<WriteFCvtF64ToF32, [SMX60_FP]>;
212+
}
213+
214+
let Latency = 6 in {
215+
def : WriteRes<WriteFClass16, [SMX60_FP]>;
216+
def : WriteRes<WriteFClass32, [SMX60_FP]>;
217+
def : WriteRes<WriteFClass64, [SMX60_FP]>;
218+
219+
def : WriteRes<WriteFCmp16, [SMX60_FP]>;
220+
def : WriteRes<WriteFCmp32, [SMX60_FP]>;
221+
def : WriteRes<WriteFCmp64, [SMX60_FP]>;
222+
223+
def : WriteRes<WriteFMovF32ToI32, [SMX60_IEU]>;
224+
def : WriteRes<WriteFMovF16ToI16, [SMX60_IEU]>;
225+
}
226+
227+
let Latency = 4 in {
228+
def : WriteRes<WriteFMovI16ToF16, [SMX60_IEU]>;
229+
def : WriteRes<WriteFMovF64ToI64, [SMX60_IEU]>;
230+
def : WriteRes<WriteFMovI64ToF64, [SMX60_IEU]>;
231+
def : WriteRes<WriteFMovI32ToF32, [SMX60_IEU]>;
232+
}
233+
234+
// Others
235+
def : WriteRes<WriteCSR, [SMX60_IEU]>;
236+
def : WriteRes<WriteNop, [SMX60_IEU]>;
237+
238+
//===----------------------------------------------------------------------===//
239+
// Bypass and advance
240+
def : ReadAdvance<ReadJmp, 0>;
241+
def : ReadAdvance<ReadJalr, 0>;
242+
def : ReadAdvance<ReadCSR, 0>;
243+
def : ReadAdvance<ReadStoreData, 0>;
244+
def : ReadAdvance<ReadMemBase, 0>;
245+
def : ReadAdvance<ReadIALU, 0>;
246+
def : ReadAdvance<ReadIALU32, 0>;
247+
def : ReadAdvance<ReadShiftImm, 0>;
248+
def : ReadAdvance<ReadShiftImm32, 0>;
249+
def : ReadAdvance<ReadShiftReg, 0>;
250+
def : ReadAdvance<ReadShiftReg32, 0>;
251+
def : ReadAdvance<ReadIDiv, 0>;
252+
def : ReadAdvance<ReadIDiv32, 0>;
253+
def : ReadAdvance<ReadIRem, 0>;
254+
def : ReadAdvance<ReadIRem32, 0>;
255+
def : ReadAdvance<ReadIMul, 0>;
256+
def : ReadAdvance<ReadIMul32, 0>;
257+
def : ReadAdvance<ReadAtomicWA, 0>;
258+
def : ReadAdvance<ReadAtomicWD, 0>;
259+
def : ReadAdvance<ReadAtomicDA, 0>;
260+
def : ReadAdvance<ReadAtomicDD, 0>;
261+
def : ReadAdvance<ReadAtomicLDW, 0>;
262+
def : ReadAdvance<ReadAtomicLDD, 0>;
263+
def : ReadAdvance<ReadAtomicSTW, 0>;
264+
def : ReadAdvance<ReadAtomicSTD, 0>;
265+
def : ReadAdvance<ReadFStoreData, 0>;
266+
def : ReadAdvance<ReadFMemBase, 0>;
267+
def : ReadAdvance<ReadFAdd16, 0>;
268+
def : ReadAdvance<ReadFAdd32, 0>;
269+
def : ReadAdvance<ReadFAdd64, 0>;
270+
def : ReadAdvance<ReadFMul16, 0>;
271+
def : ReadAdvance<ReadFMA16, 0>;
272+
def : ReadAdvance<ReadFMA16Addend, 0>;
273+
def : ReadAdvance<ReadFMul32, 0>;
274+
def : ReadAdvance<ReadFMul64, 0>;
275+
def : ReadAdvance<ReadFMA32, 0>;
276+
def : ReadAdvance<ReadFMA32Addend, 0>;
277+
def : ReadAdvance<ReadFMA64, 0>;
278+
def : ReadAdvance<ReadFMA64Addend, 0>;
279+
def : ReadAdvance<ReadFDiv16, 0>;
280+
def : ReadAdvance<ReadFDiv32, 0>;
281+
def : ReadAdvance<ReadFDiv64, 0>;
282+
def : ReadAdvance<ReadFSqrt16, 0>;
283+
def : ReadAdvance<ReadFSqrt32, 0>;
284+
def : ReadAdvance<ReadFSqrt64, 0>;
285+
def : ReadAdvance<ReadFCmp16, 0>;
286+
def : ReadAdvance<ReadFCmp32, 0>;
287+
def : ReadAdvance<ReadFCmp64, 0>;
288+
def : ReadAdvance<ReadFSGNJ16, 0>;
289+
def : ReadAdvance<ReadFSGNJ32, 0>;
290+
def : ReadAdvance<ReadFSGNJ64, 0>;
291+
def : ReadAdvance<ReadFMinMax16, 0>;
292+
def : ReadAdvance<ReadFMinMax32, 0>;
293+
def : ReadAdvance<ReadFMinMax64, 0>;
294+
def : ReadAdvance<ReadFCvtF16ToI32, 0>;
295+
def : ReadAdvance<ReadFCvtF16ToI64, 0>;
296+
def : ReadAdvance<ReadFCvtF32ToI32, 0>;
297+
def : ReadAdvance<ReadFCvtF32ToI64, 0>;
298+
def : ReadAdvance<ReadFCvtF64ToI32, 0>;
299+
def : ReadAdvance<ReadFCvtF64ToI64, 0>;
300+
def : ReadAdvance<ReadFCvtI32ToF16, 0>;
301+
def : ReadAdvance<ReadFCvtI32ToF32, 0>;
302+
def : ReadAdvance<ReadFCvtI32ToF64, 0>;
303+
def : ReadAdvance<ReadFCvtI64ToF16, 0>;
304+
def : ReadAdvance<ReadFCvtI64ToF32, 0>;
305+
def : ReadAdvance<ReadFCvtI64ToF64, 0>;
306+
def : ReadAdvance<ReadFCvtF32ToF64, 0>;
307+
def : ReadAdvance<ReadFCvtF64ToF32, 0>;
308+
def : ReadAdvance<ReadFCvtF16ToF32, 0>;
309+
def : ReadAdvance<ReadFCvtF32ToF16, 0>;
310+
def : ReadAdvance<ReadFCvtF16ToF64, 0>;
311+
def : ReadAdvance<ReadFCvtF64ToF16, 0>;
312+
def : ReadAdvance<ReadFMovF16ToI16, 0>;
313+
def : ReadAdvance<ReadFMovI16ToF16, 0>;
314+
def : ReadAdvance<ReadFMovF32ToI32, 0>;
315+
def : ReadAdvance<ReadFMovI32ToF32, 0>;
316+
def : ReadAdvance<ReadFMovF64ToI64, 0>;
317+
def : ReadAdvance<ReadFMovI64ToF64, 0>;
318+
def : ReadAdvance<ReadFClass16, 0>;
319+
def : ReadAdvance<ReadFClass32, 0>;
320+
def : ReadAdvance<ReadFClass64, 0>;
321+
322+
// Bitmanip
323+
def : ReadAdvance<ReadRotateImm, 0>;
324+
def : ReadAdvance<ReadRotateImm32, 0>;
325+
def : ReadAdvance<ReadRotateReg, 0>;
326+
def : ReadAdvance<ReadRotateReg32, 0>;
327+
def : ReadAdvance<ReadCLZ, 0>;
328+
def : ReadAdvance<ReadCLZ32, 0>;
329+
def : ReadAdvance<ReadCTZ, 0>;
330+
def : ReadAdvance<ReadCTZ32, 0>;
331+
def : ReadAdvance<ReadCPOP, 0>;
332+
def : ReadAdvance<ReadCPOP32, 0>;
333+
def : ReadAdvance<ReadORCB, 0>;
334+
def : ReadAdvance<ReadIMinMax, 0>;
335+
def : ReadAdvance<ReadREV8, 0>;
336+
def : ReadAdvance<ReadSHXADD, 0>;
337+
def : ReadAdvance<ReadSHXADD32, 0>;
338+
def : ReadAdvance<ReadCLMUL, 0>;
339+
// Single-bit instructions
340+
def : ReadAdvance<ReadSingleBit, 0>;
341+
def : ReadAdvance<ReadSingleBitImm, 0>;
342+
343+
//===----------------------------------------------------------------------===//
344+
// Unsupported extensions
345+
defm : UnsupportedSchedV;
346+
defm : UnsupportedSchedXsfvcp;
347+
defm : UnsupportedSchedZabha;
348+
defm : UnsupportedSchedZbkb;
349+
defm : UnsupportedSchedZbkx;
350+
defm : UnsupportedSchedZfa;
351+
defm : UnsupportedSchedZvk;
352+
defm : UnsupportedSchedSFB;
353+
}

0 commit comments

Comments
 (0)