Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI change in __extendhfsf2 and __truncsfhf2 on X86 #56854

Closed
kparzysz-quic opened this issue Aug 1, 2022 · 21 comments
Closed

ABI change in __extendhfsf2 and __truncsfhf2 on X86 #56854

kparzysz-quic opened this issue Aug 1, 2022 · 21 comments
Labels
ABI Application Binary Interface backend:X86

Comments

@kparzysz-quic
Copy link

See also: https://discourse.llvm.org/t/how-to-build-compiler-rt-for-new-x86-half-float-abi/63366/15

Support for _Float16 was added in 15.0.0 (based on target capabilities). This had the unfortunate consequence that it changed the way calls to __truncsfhf2 and __extendhfsf2 are handled: in prior version of LLVM, or in the absence of support for _Float16, the half-precision floating point value was passed as uint16_t. In the x86 calling convention that type is passed in RDI, whereas _Float16 values are passed in XMM0. This introduces binary incompatibility between LLVM 15 and prior versions.

@kparzysz-quic kparzysz-quic added this to the LLVM 15.0.0 Release milestone Aug 1, 2022
@EugeneZelenko EugeneZelenko added backend:X86 ABI Application Binary Interface and removed new issue labels Aug 1, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 1, 2022

@llvm/issue-subscribers-backend-x86

@tru
Copy link
Collaborator

tru commented Aug 1, 2022

This seems like it would be a blocker. Who worked on Float16 and can chime in?

@phoebewang
Copy link
Contributor

@kparzysz-quic Which platform you are using? AFAIK, only Darwin uses __truncsfhf2 and __extendhfsf2 before 15.0. Other platforms use __gnu_h2f_ieee and __gnu_f2h_ieee. https://godbolt.org/z/ccbbET9d1

@phoebewang
Copy link
Contributor

This seems like it would be a blocker. Who worked on Float16 and can chime in?

I worked on that. I don't think this is a blocker. The change of the ABI is expected. GCC 12 uses the same ABI as well. https://godbolt.org/z/bc4xx5zoa

@kparzysz-quic
Copy link
Author

To recap: I work on a project where LLVM libraries are used as code generator in another application.

  1. The application has the LLVM libraries linked into it, but it uses external toolchains to link them.
  2. If the application had LLVM 15+ with _Float16 support in it, the code generator would generate calls to __truncsfhf2 and __extendhfsf2. These calls would assume floating-point ABI.
  3. If the external toolchain is clang14 or earlier, the included compiler-rt will have __truncsfhf2 and __extendhfsf2 that assume integer ABI. The code will link, but will give incorrect output.

@phoebewang
Copy link
Contributor

I see your point, just wondering if it is an inherent problem for compiler-rt if codegen doing ABI break change.
OTOH, the external toolchain also might be GCC 12. For such case, we still have to generate calls to these function with floating-point ABI.

@kparzysz-quic
Copy link
Author

I guess a flag (cl::opt, not driver flag) that would force the x86 backend to always use the __gnu_(h2f|f2h)_ieee functions would help. Would it be possible/reasonable to add it?

@phoebewang
Copy link
Contributor

It's possible to do it in compiler, but there's still problem in compiler-rt. Because __gnu_(h2f|f2h)_ieee is just an alias of __truncsfhf2/__extendhfsf2: https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/truncsfhf2.c#L19
It means we also need to detach them and make __gnu_(h2f|f2h)_ieee always use uint16_t. But we can't simply do that because ARM/AArch64 have already using both for almost 2 years. See D91732, D91733. It means we need to make sure all the current user scenario combined with targets (like ARM/AArch64) x platforms (Linux/Darwin/Windows) won't be affected by such a change too.

Thinking it again. I doubt if it really a reasonable user scenario in your case. We should always use the same compiler-rt with the same compiler version. E.g., D126953 introduced a libcall __truncsfbf2 lowering in compiler and its implementation in compiler-rt. You cannot use an older version compiler-rt for it.
Another example D107245.
So you cannot assume it always work when using old compiler-rt with newer compiler, either we may get compile fail or unexpected result. That said, the problem is inherent somehow, thus we don't necessarily to solve this problem.

@kparzysz-quic
Copy link
Author

I think the best thing ultimately (for TVM) would be to just emit the conversion code directly into the module...

@phoebewang
Copy link
Contributor

The best way is keep the compiler-rt the same version as compiler.
If it is not possible, a workaround is to avoid lowering to libcalls. The native F16C instructions are avaliable in Ivy Bridge and latter processors. For these targets, we just need to use option -march=ivybridge(or up) or -mf16c in Clang front-end and/or -mcpu=ivybridge or "target-features"="+f16c" in the backend.

@tru
Copy link
Collaborator

tru commented Aug 3, 2022

Just so I am following along correctly - is it something we want to try to fix before 15 - or can I remove this as a release blocker?

@phoebewang
Copy link
Contributor

Yes. I think so. Maybe we can close it @kparzysz-quic ?

@kparzysz-quic
Copy link
Author

It doesn't seem like this is something that can be fixed, but maybe Saleem (@compnerd) could chime in before we close this.

@compnerd
Copy link
Member

compnerd commented Aug 3, 2022

The compiler-rt builtins are supposed to match the GCC ABI. We are matching the GCC ABI so this seems like the correct behaviour. Overall, the builtins are supposed to have a pretty stable ABI, so it is rather unfortunate that this broke. Replacing the implementation for __gnu_{h2f,f2h}_ieee with an unaliased version is reasonable as long as we continue to conform to the ABI.

I am left wondering how bad of a problem would this be in practice. I haven't given it too much thought, but perhaps we can get away with something like providing both variants and an alias, associated with a comdat symbol and have the compiler select between the two if there is FP16 usage without FP16 hardware support. This would be a pretty hefty penalty for the compiler (and would potentially leak some impact into link times as well). I am rather reluctant to go down that path.

@kparzysz-quic
Copy link
Author

The worst outcome is that user code compiles and links, but gives a wrong answer. Something like what we do for ABI breaking checks (i.e. introducing version symbols and forcing a linking error) would be nice, but given that we want to be able to link with GCC's runtime, this doesn't seem like a solution either.

I don't want to impose any restrictions on compiler-rt that would hinder LLVM or clang, so maybe we're stuck with the current situation. I just wanted to make sure that we don't leave any option unexplored.

@compnerd
Copy link
Member

compnerd commented Aug 3, 2022

I think that build time checks would be able to alleviate that concern (à la autotools). It should be possible to craft a simple test that helps identify which variant of the ABI is in use and then either abort or change the code generation.

@kparzysz-quic
Copy link
Author

That may be the way out of this.

Anyway, we should document this change in the release notes, if it's not there yet.

Thanks everyone for your inputs.

@phoebewang
Copy link
Contributor

we should document this change in the release notes, if it's not there yet.

There's a very basic note there: https://github.com/llvm/llvm-project/blob/release/15.x/llvm/docs/ReleaseNotes.rst#changes-to-the-x86-backend
It's good if we can highlight the compiler-rt issue, but I don't know how to change it once branched. @tstellar

@tru
Copy link
Collaborator

tru commented Aug 4, 2022

@phoebewang you can do the change directly on the release/15.x branch and we will take care of the rest.

@phoebewang
Copy link
Contributor

Yeah, I can put it in my fork repo and then cherry-pick. Thanks @tru.

@phoebewang
Copy link
Contributor

I put a patch D131172.

andrewrk added a commit to ziglang/zig that referenced this issue Aug 4, 2022
phoebewang added a commit that referenced this issue Aug 10, 2022
#56854 shows a backwards compatibility problem when builtins of compiler-rt don't follow ABI. We need to prevent to fall into the trap again for BF16.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D131147
bobsayshilol added a commit to bobsayshilol/tinygrad that referenced this issue Jul 31, 2023
compiler-rt 15 has a breaking ABI change affecting how half precision
floating point values are passed between functions
(llvm/llvm-project#56854).

For reasons I don't fully understand, different versions of clang
exhibit the issue on different OSes. For example on Ubuntu 22.04
clang-15 is affected and clang-16 is OK, but on Fedora clang-16 is
affected. clang-16 is also the default on Fedora meaning that all the
half precision tests fail with random numbers because they're reading
garbage.

There's a one-liner to test for this issue in the LLVM project, so make
use of that to tell if the CLANG backend will produce correct results.
bobsayshilol added a commit to bobsayshilol/tinygrad that referenced this issue Jul 31, 2023
…e can

Taking a hint from llvm/llvm-project#56854 (comment)
we can make use of `-march=native` flag to replace the library call
with a native instruction on Ivy Bridge and above. This doesn't fix the
bug in compiler-rt, but it means that we can support CLANG on more
hardware.

Note that the usual issues of `-march=native` shouldn't affect us since
we only compile these at runtime and we're building on the same machine
that we're running on. I've not measured if this improves the speed,
only that the tests now pass.
bobsayshilol added a commit to bobsayshilol/tinygrad that referenced this issue Jul 31, 2023
compiler-rt 15 has a breaking ABI change affecting how half precision
floating point values are passed between functions
(llvm/llvm-project#56854).

For reasons I don't fully understand, different versions of clang
exhibit the issue on different OSes. For example on Ubuntu 22.04
clang-15 is affected and clang-16 is OK, but on Fedora clang-16 is
affected. clang-16 is also the default on Fedora meaning that all the
half precision tests fail with random numbers because they're reading
garbage.

There's a one-liner to test for this issue in the LLVM project, so make
use of that to tell if the CLANG backend will produce correct results.
bobsayshilol added a commit to bobsayshilol/tinygrad that referenced this issue Jul 31, 2023
…e can

Taking a hint from llvm/llvm-project#56854 (comment)
we can make use of `-march=native` flag to replace the library call
with a native instruction on Ivy Bridge and above. This doesn't fix the
bug in compiler-rt, but it means that we can support CLANG on more
hardware.

Note that the usual issues of `-march=native` shouldn't affect us since
we only compile these at runtime and we're building on the same machine
that we're running on. I've not measured if this improves the speed,
only that the tests now pass.
heroxbd added a commit to heroxbd/gentoo that referenced this issue Nov 19, 2023
heroxbd added a commit to heroxbd/gentoo that referenced this issue Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface backend:X86
Projects
Archived in project
Development

No branches or pull requests

6 participants