-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[AArch64] Set the cache line size to 64 for the V2 and V3. #148213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange. I've set this to V3, have not tested this, but looks like the sensible thing to do, but am happy to remove this.
@llvm/pr-subscribers-backend-aarch64 Author: Sjoerd Meijer (sjoerdmeijer) ChangesThis sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange. I've also set this for V3, have not tested this, but looks like the sensible thing to do, but am happy to remove this. Full diff: https://github.com/llvm/llvm-project/pull/148213.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 68ed10570a52f..4ac93526295aa 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -270,6 +270,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
break;
case NeoverseV2:
case NeoverseV3:
+ CacheLineSize = 64;
EpilogueVectorizationMinVF = 8;
MaxInterleaveFactor = 4;
ScatterOverhead = 13;
|
Exactly, and that ended up exposing another correctness issue... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't result in extra compile-times, but it does enable a lot more interchange.
Exactly, and that ended up exposing another correctness issue... Sorry, I hadn’t really looked at the details of applied interchanges, so I totally missed it. Just noticed it a moment ago.
For checking correctness of LoopInterchange, it would probably be good to test with interchanging all loops that are considered legal, ignoring the cost model.
It makes perfect sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this change, I think it's fine as long as it follows the specifications of Neoverse V2 and V3 (though I'm not familiar with them). However, I think it would be better to merge this after correctness issues of LoopInterchange are resolved to avoid some troubles.
@kasuga-fj : thanks for the report, shall I look at that miscompilation since you're probably busy with the delinearization? @fhahn : yeah that makes sense. I am going to do some more testing. I might put up a little patch introducing an internal option to ignore the cost-model, I can see how this can be useful for testing and experimentation. |
I've probably got a fix in mind, so I can take care of it next week if that's okay. But if it's urgent, feel free to take it over. |
Sure, go ahead if you already have a way of fixing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 sounds good if this does something useful nowadays. It could probably be the default for all CPUs.
FYI: I have kicked off different test jobs and am running different fuzzers over the weekend. I am testing this with interchange enabled, the cache line set to 64, and have disable the cost model, to test interchange as much as possible as suggested. I have already found one unrelated issue, #133922, but will report back next week on the interchange results. |
I think so too. The default of 0 doesn't make an awful lot of sense. |
This sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange.
I've also set this for V3, have not tested this, but looks like the sensible thing to do, but am happy to remove this.