Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for microarchitecture levels #31

Merged
merged 5 commits into from
May 20, 2021

Conversation

pikacic
Copy link
Contributor

@pikacic pikacic commented May 11, 2021

This introduces definitions for the microarchitecture levels:

  • x86-64-v2 (basically an alias to nehalem with generic tuning)
  • x86-64-v3 (alias to haswell with generic tuning)
  • x86-64-v4 (alias to skylake_avx512 with generic tuning)

I also implemented equivalent compiler flags for gcc < 11.1 and clang < 12.0.

It's my first PR pm archspec, so I have some doubts about how to tune the compiler version ranges and the features list.

Resolves #30

@boegel
Copy link
Member

boegel commented May 11, 2021

Thanks a lot for looking info this @pikacic !

I would really like it if we could use more meaningful names like x86_64-avx, x86_64-avx2, etc., although I understand this doesn't come from thin air... Can we do both though, x86_64-v2 and x86_64-avx which are equivalent?

@tgamblin
Copy link
Member

@boegel we don't currently have aliases, and the names are pretty much set by gcc and clang -- it's in the ABI spec now. I think we can add aliases later if we grow a feature. I want to do that so that so that rome, milan, genoa eventually map to zen2, zen3, zen4, etc.

cpu/microarchitectures.json Outdated Show resolved Hide resolved
"flags": "-march={name} -mtune=generic"
},
{
"versions": "4.6:10.3",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is the correct way of specifying gcc < 11.1? Should I use the latest known 10.x version or a dummy 10.99?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just :11.0 should work. The ranges are (for better or for worse) inclusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although GCC 11.0 does not exist (it looks like they have been skipping the x.0 tag recently).
What about Clang?

@pikacic
Copy link
Contributor Author

pikacic commented May 11, 2021

@boegel, I agree with you, but, as @tgamblin said, it's not my choice.

About the aliasing, at the current state we cannot detect to be on a x86-64-v3 or on a haswell processor has they have strictly identical features lists, so I do not know how archspec detection is going to behave.

OTOH x86-64-v3 is a bit less than haswell... and a bit more, in the sense that some of the features in a haswell CPU are not listed (see my comment on cx16 and lahf_lm). I'd be glad if you could suggest how to proceed.

@tgamblin
Copy link
Member

I think we should break the tie by detecting the actual architecture. In some cases the feature lists are tweaked so that we can detect on cloud nodes and VMs, where certain features are disabled. But we still know with very strong likelihood that, e.g., the chip is a haswell.

x86-64-v3 is virtual -- i.e. there isn't actually a microarchitecture called that, and I think users will want detection to return the actual chip name. We can still say that x86-64-v3 is compatible with haswell (but shouldn't do vice versa due to the instructions you mention).

@tgamblin
Copy link
Member

I think the above actually works out pretty well in the implementation -- arch spec will take the "highest" architecture in the DAG that has the features of the host. If haswell inherits from x86-64-v3, it'll always be preferred in detection.

Copy link
Member

@tgamblin tgamblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pikacic: This looks great to me, modulo the version question you asked. Can you address that?

@alalazo can you give this a look? If it's ok with you I think it's good to go.

"flags": "-march={name} -mtune=generic"
},
{
"versions": "4.6:10.3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just :11.0 should work. The ranges are (for better or for worse) inclusive.

@boegel
Copy link
Member

boegel commented May 12, 2021

@boegel we don't currently have aliases, and the names are pretty much set by gcc and clang -- it's in the ABI spec now. I think we can add aliases later if we grow a feature. I want to do that so that so that rome, milan, genoa eventually map to zen2, zen3, zen4, etc.

OK, x86_64-avx (or just avx) as an alias for a virtual microarchitecture makes sense.

@tgamblin
Copy link
Member

tgamblin commented May 12, 2021

Makes sense, x86_64-avx

I generally prefer dashes to _ and almost asked for it in this PR, but we made a conscious decision to use _ in arch spec so that the names would fit unambiguously in hyphenated target triplet strings (e.g. x86_64_avx-linux-gnu)

}
},
"nehalem": {
"from": ["x86_64_v2"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if:

Suggested change
"from": ["x86_64_v2"],
"from": ["core2", "x86_64_v2"],

is a better modeling choice (and similarly for the other virtual versions below). Need to give some more thought at it.

Copy link
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this looks great! I am just marking a request for change to be sure:

  • We go through the comment on inheritance I left above
  • We test these changes in a branch of the archspec library before merging them

@tgamblin
Copy link
Member

@alalazo is there more left to do here? It would be good to have more tests, but if this works with archspec right now I'm tempted to say we should merge it. Can you try it out?

@alalazo
Copy link
Member

alalazo commented May 18, 2021

@tgamblin Will do by tomorrow. Do we have any preference on this #31 (comment) ?

Basically right now we have:

nehalem
  |
core2
  |
[ ... ]

This PR makes it like:

nehalem
  |
x86-64-v2 (virtual)
  |
core2
  |
[ ... ]

I think it would be better to branch in like:

nehalem
  |       \
core2   x86-64-v2 (virtual)
  |
[ ... ]

@pikacic
Copy link
Contributor Author

pikacic commented May 18, 2021

@alalazo, something like

nehalem
  |       \
core2   x86-64-v2 (virtual)
  |
[ ... ]

hides the fact that x86-64-v2 can run core2 binaries. I would prefer

nehalem
  |     \
  |      x86-64-v2 (virtual)
  |     /
core2   
  |
[ ... ]

@alalazo
Copy link
Member

alalazo commented May 19, 2021

@pikacic @tgamblin I'm reading the documents more closely and my understanding is that the modeling here is currently incorrect, since it adds properties to what is in the definitions of the virtual levels. For instance in this PR we have x86_64_v3 inherit from ivybridge but that seems wrong since in the virtual levels I don't see mentioned, for instance, AES or FSGSBASE or RDRAND etc. that ivybridge has according to GCC docs.

As per #31 (comment) I think a better modeling would be to:

  1. Have all the virtual levels inherit from each other (x86_64 -> x86_64_v2 -> etc.)
  2. Keep the virtual levels separated from "real" uarchs (i.e. no branches from a real uarch to a virtual level)
  3. Branch-in as suggested on the first uarch that supports that virtual level

I suggest point 2 specifically since, by reading the documents, it doesn't seem the virtual levels are only for Intel so it would be good to keep them separated from the vendors - since they could also branch into AMD procs?

hides the fact that x86-64-v2 can run core2 binaries. I would prefer

That would be true for core2 but not for other uarchs, if my understanding is correct. So I would not special case core2 for the reason explained above.

Let me know if I missed or misunderstood any point.

@pikacic
Copy link
Contributor Author

pikacic commented May 19, 2021

@alalazo, I think now I understand a bit better the archspec architecture and I agree completely with your suggestion.
Give me a few minutes to implement it.

@pikacic
Copy link
Contributor Author

pikacic commented May 19, 2021

It took me a bit longer than expected because I tweaked a bit the flags to match a bit better the instructions enabled by the uarch levels.

With the latest change (if I understand correctly how this all works) the x86-64-vX architectures will not be matched when detecting the CPU uarch (somehow as before). The DAG is such that when a binary is built for x86-64-v3 we can run it on haswell (or more recent) CPUs.

I also declared the compatibility of AMD CPUs with the new microarchitecture levels (deduced from GCC docs).

@alalazo, @tgamblin, can you check, please?

@alalazo
Copy link
Member

alalazo commented May 19, 2021

I'll have a look at this asap and try it in archspec. One minor request for the layout of the JSON file: can we have the definitions for the virtual levels directly below the x86_64 definition?

@pikacic
Copy link
Contributor Author

pikacic commented May 19, 2021

I couldn't decide if to have them grouped at the beginning or at the end, so I kept them interleaved.
I'll move them right away... and let me know if you want me to squash the commits.

@alalazo
Copy link
Member

alalazo commented May 19, 2021

I'll move them right away... and let me know if you want me to squash the commits.

No worries. We'll squash merge the PR in the end 🙂

@alalazo
Copy link
Member

alalazo commented May 19, 2021

Thanks @pikacic I'll double check the entries and add a draft PR in archspec. I'll get back here as soon as I have news.

@alalazo
Copy link
Member

alalazo commented May 20, 2021

@pikacic I think zen can derive from x86_64_v3 directly instead of x86_64. Can you please double check?

@pikacic
Copy link
Contributor Author

pikacic commented May 20, 2021

@alalazo, you are right... I missed that zen was not compatible with excavator.

I also changed the compatibility list of bulldozer, as ["x86_64", "x86_64_v2"] was not particularly useful.

@alalazo
Copy link
Member

alalazo commented May 20, 2021

@pikacic Thanks! This looks great to me!

@alalazo
Copy link
Member

alalazo commented May 20, 2021

For reference, once this is merged I'll update archspec/archspec#51

@tgamblin
Copy link
Member

LGTM!

@tgamblin
Copy link
Member

Thanks @pikacic!

@alalazo alalazo merged commit de43aab into archspec:master May 20, 2021
@pikacic pikacic deleted the add-microarch-levels branch May 20, 2021 15:03
alalazo pushed a commit that referenced this pull request Oct 22, 2021
Introduced definitions for the x86-64 microarchitecture levels:

- x86-64-v2 (basically an alias to nehalem with generic tuning)
- x86-64-v3 (alias to haswell with generic tuning)
- x86-64-v4 (alias to skylake_avx512 with generic tuning)

Added compiler flags also for gcc < 11.1 and clang < 12.0.
alalazo pushed a commit that referenced this pull request Oct 22, 2021
Introduced definitions for the x86-64 microarchitecture levels:

- x86-64-v2 (basically an alias to nehalem with generic tuning)
- x86-64-v3 (alias to haswell with generic tuning)
- x86-64-v4 (alias to skylake_avx512 with generic tuning)

Added compiler flags also for gcc < 11.1 and clang < 12.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support fo psABI x86-64 micro-architecture levels
5 participants