Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default aarch64-min-jump-table-entries to 9 #98391

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hiraditya
Copy link
Collaborator

@hiraditya hiraditya commented Jul 10, 2024

The previous experiment in #71166 set to 13 but the results weren't significantly better. Setting to 9

See #98223 for the motivation.

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 10, 2024

@llvm/pr-subscribers-backend-aarch64

Author: AdityaK (hiraditya)

Changes

The previous experiment in #71166 set to 13 but the results weren't significantly better. Setting to 9


Full diff: https://github.com/llvm/llvm-project/pull/98391.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+1-1)
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index 1fad1d5ca6d7d..47d179712a12c 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -72,7 +72,7 @@ static cl::opt<AArch64PAuth::AuthCheckMethod>
                                cl::values(AUTH_CHECK_METHOD_CL_VALUES_LR));
 
 static cl::opt<unsigned> AArch64MinimumJumpTableEntries(
-    "aarch64-min-jump-table-entries", cl::init(13), cl::Hidden,
+    "aarch64-min-jump-table-entries", cl::init(9), cl::Hidden,
     cl::desc("Set minimum number of entries to use a jump table on AArch64"));
 
 unsigned AArch64Subtarget::getVectorInsertExtractBaseCost() const {

@hiraditya hiraditya requested a review from ilinpv July 10, 2024 21:31
@davemgreen
Copy link
Collaborator

What do you mean by "weren't significantly better", and do you have a reason to change this to 9?

@hiraditya
Copy link
Collaborator Author

hiraditya commented Jul 11, 2024

What do you mean by "weren't significantly better", and do you have a reason to change this to 9?

Based on the comment here: #71166 (comment) and #71166 (comment)

13 is too large IMO, it results in codesize increase and it seems like #71166 was purely chosen for benchmarking purposes. Is there a way to find a better number here?

The number of branches is of the order of 2*(number of cases), it may appear fine for benchmarking purposes but i dont think this is a good idea to generate so many branches when jump table was sufficient.

@hiraditya
Copy link
Collaborator Author

On a related note RISC-V chose to have a threshold of 5 5973272

The previous experiment in llvm#71166 set to 13 but the results
werent significantly better. Setting to 9
@david-arm
Copy link
Contributor

Have you run this patch several times with perlbench from SPEC2017 to verify the before and after results? At the time I created #71166 I saw an improvement in performance, although I realise perlbench can be noisy. Do you have any SPEC2017 results you can share on this PR? Thanks!

@ilinpv
Copy link
Contributor

ilinpv commented Jul 11, 2024

@hiraditya it might be worth trying a more flexible solution here to balance size and speed, for example set aarch64-min-jump-table-entries another value when optimizing for size with -Os/Oz. It addition it could be set depending on branch profile counters.

@david-arm
Copy link
Contributor

Just to clarify, I'm not against changing this value if it's beneficial, since #71166 was a bit of an experiment anyway.

It sounds like from the comments (rather than the commit message) that you are worried about code size? If so, I can certainly understand that point of view. Like @ilinpv suggested I believe we already use a lower value for -Oz and can extend this to -Os if that helps? Or if you believe this patch has no effect on performance at -O3 perhaps we could just change the default at all optimisation levels.

@davemgreen
Copy link
Collaborator

What do you mean by "weren't significantly better", and do you have a reason to change this to 9?

Based on the comment here: #71166 (comment) and #71166 (comment)

13 is too large IMO, it results in codesize increase and it seems like #71166 was purely chosen for benchmarking purposes. Is there a way to find a better number here?

The number of branches is of the order of 2*(number of cases), it may appear fine for benchmarking purposes but i dont think this is a good idea to generate so many branches when jump table was sufficient.

I'm not particularly married to the current value. I'm not sure if there is a single optimal values, and the one we have is kind of odd, but if we are changing it then I would hope we had a better reason. We used to have a lower value, GCC has always used a values closer to what we have now. The real answer can likely depend on many issues like the predictability of each of the branches or the relative capability of the branch predictor in the cpu.

When I try this patch on the llvm test suite + 3 versions of spec, the geomean performance is 0.1% worse. The codesize doesn't change (at -O3, it might be different at other opt levels). Those results are ran very quickly though, so might be more noisy than we would like, and this patch in general will lead to changes that are often noisy due to the nature of branches being differently aligned. If we ran it a week earlier or a week later the performance might be totally different for it.

@hiraditya
Copy link
Collaborator Author

hiraditya commented Jul 11, 2024

Like @ilinpv suggested I believe we already use a lower value for -Oz and can extend this to -Os if that helps?

As I don't have a way to run SPEC at the moment, i'm fine if we make this change only for -Os. And i'll create a separate issue for adding profile-info to the codegen of switch statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants