Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc emits calls to __sync_ when using load/store with +forced-atomics #101300

Open
Lokathor opened this issue Sep 2, 2022 · 11 comments
Open

rustc emits calls to __sync_ when using load/store with +forced-atomics #101300

Lokathor opened this issue Sep 2, 2022 · 11 comments
Labels
A-atomic Area: atomics, barriers, and sync primitives A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Lokathor
Copy link
Contributor

Lokathor commented Sep 2, 2022

I went to start a new branch for the gba crate and things were going well until LLVM demanded that I add a bunch of definitions for atomic intrinsics:

  = note: arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\deps\libgba-e42d0bb0e1acbf2b.rlib(gba-e42d0bb0e1acbf2b.8ymripsqk83bpm.rcgu.o): in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_4'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_2'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_load':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2963: undefined reference to `__sync_val_compare_and_swap_2'

These are all part of LLVM's sync function set, and they are probably caused by the atomic accesses I'm using:

Now the GBA is using the ARMv4T architecture, which does not have atomic instructions at the CPU level, but the +forced-atomic feature was recently added to many target definitions for older targets. My understanding of how this was supposed to work is that an aligned atomic load/store will just use the normal load/store instruction, and then the "advanced" atomic actions like fetch_add would call a function. However, it seems to be the cast that (using the thumbv4t-none-eabi target) LLVM still wants to call functions for plain load/store.

@thomcc
Copy link
Member

thomcc commented Sep 3, 2022

Related to (possibly dupe of) #99668.

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Sep 3, 2022
@nikic
Copy link
Contributor

nikic commented Sep 3, 2022

Yeah, this shouldn't be happening. Based on the symbol names, the atomic load or store seems to get lowered to atomic CAS at some point.

@nikic
Copy link
Contributor

nikic commented Sep 3, 2022

Indeed: https://llvm.godbolt.org/z/TvjP1on6f

@thomcc
Copy link
Member

thomcc commented Sep 3, 2022

There's no __sync equivalent to plain load/store, so I believe this is to force going through libcalls on targets where you'd need to take a global lock around all access to support RMWs (which I guess LLVM assumes we might want to support).

@thomcc
Copy link
Member

thomcc commented Sep 3, 2022

This is a lot like the current situation for compiler_fence -- on some embedded targets llvm seems to consider __sync to be a good fallback: https://github.com/llvm/llvm-project/blob/9905dae5e18cd55ee6bb8678c95fed940ded1ef9/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3937-L3939

@nikic
Copy link
Contributor

nikic commented Sep 3, 2022

Ah, I see the problem. These are seq_cst atomic load/stores, but apparently thumbv4t has no dmb instruction, so it's not possible to lower atomic load/store with stronger than monotonic ordering natively. Thus we get libcalls even with the +atomics-32 feature.

This does look correct to me, and it also looks like this is not new LLVM 15 behavior either. So to clarify, did this code (using these atomic orderings) work previously?

I think this means that we do need to declare atomics as completely unsupported for thumbv4t targets.

@Lokathor
Copy link
Contributor Author

Lokathor commented Sep 3, 2022

Ah, I should be more clear: This code is all new code, and I have never before even attempted to use Atomic* types on this target.

However, I was told by both Ralf and m-ou-se that plain loads are stores are "effectively atomic".

So when the Forced Atomic stuff was put in I went to try it out, and it did not work as expected. What I expected, given that it's an ARMv4T, is that all atomic orderings will lower to plain ldr and str. There's no need for dmb because there's only one CPU and it's not capable of executing operations out of order. The only thing that needs to respect the atomic orderings is LLVM itself when it's optimizing, other than that the CPU will only perform operations in exactly their order in the assembly.

What I am really after is a way for the main program (just one thread) to communicate with the interrupt handler.

  • In the past I've used volatile accesses for this
  • Doing that was called unsound in the presence of optimizations re-ordering standard accesses before or after volatile accesses
  • Accordingly, I tried this atomic stuff, and it's not working right.

So, if you think that everything here is working as far as you expect, then I might have to tell Ralf and Mo-us-e that there's no "effective atomics" after all and they need to go back to the drawing board.

@Lokathor
Copy link
Contributor Author

Lokathor commented Sep 3, 2022

On advice thomcc I tried other orderings:

  • for load/Acquire and store/Release it still calls functions.
  • for all ops Relaxed it did not call functions.

@Lokathor
Copy link
Contributor Author

Lokathor commented Sep 3, 2022

  • Using load(Acquire)/store(Release) would be correct for this code, but still emits the library calls.
  • Relaxed doesn't emit library calls, but might be reordered incorrectly (non-atomic access can move past the atomic changes).
  • Adding calls to compiler_fence to prevent that should be fine here, except that hits #62256 and becomes a library call to __sync_synchronize, which means that a 2 (load) or 3 (store) cycle operation gets 6 cycles of overhead added just to jump to an empty function and immediately return.

So yeah, atomics are completely broken on this target at the moment.

In the mean time it can be worked around with inline_asm, but then i either have to make every access a macro call (to get good register allocation) or wrap it in a method that gets inlined but then every access will always be fighting for r0.

@taiki-e
Copy link
Member

taiki-e commented Sep 4, 2022

  • Relaxed doesn't emit library calls

FWIW, (when you use core::sync::atomic's atomics) whether this works or not actually seems to depend on compiler optimizations. (In cases where Atomic*::{load,store} cannot be inlined enough, libcalls will still be generated: taiki-e/portable-atomic#36 (comment))

@Lokathor
Copy link
Contributor Author

Lokathor commented Sep 4, 2022

Ah yes, i hadn't considered. For this platform, I always use opt-level = 3 for all builds.

@workingjubilee workingjubilee added A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. A-atomic Area: atomics, barriers, and sync primitives labels Mar 4, 2023
@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-atomic Area: atomics, barriers, and sync primitives A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants