rustc emits calls to `__sync_` when using load/store with `+forced-atomics` #101300

Lokathor · 2022-09-02T00:42:03Z

I went to start a new branch for the gba crate and things were going well until LLVM demanded that I add a bunch of definitions for atomic intrinsics:

  = note: arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\deps\libgba-e42d0bb0e1acbf2b.rlib(gba-e42d0bb0e1acbf2b.8ymripsqk83bpm.rcgu.o): in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_4'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_2'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_load':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2963: undefined reference to `__sync_val_compare_and_swap_2'

These are all part of LLVM's sync function set, and they are probably caused by the atomic accesses I'm using:

Now the GBA is using the ARMv4T architecture, which does not have atomic instructions at the CPU level, but the +forced-atomic feature was recently added to many target definitions for older targets. My understanding of how this was supposed to work is that an aligned atomic load/store will just use the normal load/store instruction, and then the "advanced" atomic actions like fetch_add would call a function. However, it seems to be the cast that (using the thumbv4t-none-eabi target) LLVM still wants to call functions for plain load/store.

The text was updated successfully, but these errors were encountered:

thomcc · 2022-09-03T20:20:10Z

Related to (possibly dupe of) #99668.

nikic · 2022-09-03T20:27:49Z

Yeah, this shouldn't be happening. Based on the symbol names, the atomic load or store seems to get lowered to atomic CAS at some point.

nikic · 2022-09-03T20:31:12Z

Indeed: https://llvm.godbolt.org/z/TvjP1on6f

thomcc · 2022-09-03T20:31:19Z

There's no __sync equivalent to plain load/store, so I believe this is to force going through libcalls on targets where you'd need to take a global lock around all access to support RMWs (which I guess LLVM assumes we might want to support).

thomcc · 2022-09-03T20:38:56Z

This is a lot like the current situation for compiler_fence -- on some embedded targets llvm seems to consider __sync to be a good fallback: https://github.com/llvm/llvm-project/blob/9905dae5e18cd55ee6bb8678c95fed940ded1ef9/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3937-L3939

nikic · 2022-09-03T21:19:54Z

Ah, I see the problem. These are seq_cst atomic load/stores, but apparently thumbv4t has no dmb instruction, so it's not possible to lower atomic load/store with stronger than monotonic ordering natively. Thus we get libcalls even with the +atomics-32 feature.

This does look correct to me, and it also looks like this is not new LLVM 15 behavior either. So to clarify, did this code (using these atomic orderings) work previously?

I think this means that we do need to declare atomics as completely unsupported for thumbv4t targets.

Lokathor · 2022-09-03T22:02:29Z

Ah, I should be more clear: This code is all new code, and I have never before even attempted to use Atomic* types on this target.

However, I was told by both Ralf and m-ou-se that plain loads are stores are "effectively atomic".

So when the Forced Atomic stuff was put in I went to try it out, and it did not work as expected. What I expected, given that it's an ARMv4T, is that all atomic orderings will lower to plain ldr and str. There's no need for dmb because there's only one CPU and it's not capable of executing operations out of order. The only thing that needs to respect the atomic orderings is LLVM itself when it's optimizing, other than that the CPU will only perform operations in exactly their order in the assembly.

What I am really after is a way for the main program (just one thread) to communicate with the interrupt handler.

In the past I've used volatile accesses for this
Doing that was called unsound in the presence of optimizations re-ordering standard accesses before or after volatile accesses
Accordingly, I tried this atomic stuff, and it's not working right.

So, if you think that everything here is working as far as you expect, then I might have to tell Ralf and Mo-us-e that there's no "effective atomics" after all and they need to go back to the drawing board.

Lokathor · 2022-09-03T22:15:04Z

On advice thomcc I tried other orderings:

for load/Acquire and store/Release it still calls functions.
for all ops Relaxed it did not call functions.

Lokathor · 2022-09-03T22:31:38Z

Using load(Acquire)/store(Release) would be correct for this code, but still emits the library calls.
Relaxed doesn't emit library calls, but might be reordered incorrectly (non-atomic access can move past the atomic changes).
Adding calls to compiler_fence to prevent that should be fine here, except that hits #62256 and becomes a library call to __sync_synchronize, which means that a 2 (load) or 3 (store) cycle operation gets 6 cycles of overhead added just to jump to an empty function and immediately return.

So yeah, atomics are completely broken on this target at the moment.

In the mean time it can be worked around with inline_asm, but then i either have to make every access a macro call (to get good register allocation) or wrap it in a method that gets inlined but then every access will always be fighting for r0.

taiki-e · 2022-09-04T04:48:07Z

Relaxed doesn't emit library calls

FWIW, (when you use core::sync::atomic's atomics) whether this works or not actually seems to depend on compiler optimizations. (In cases where Atomic*::{load,store} cannot be inlined enough, libcalls will still be generated: taiki-e/portable-atomic#36 (comment))

Lokathor · 2022-09-04T05:17:13Z

Ah yes, i hadn't considered. For this platform, I always use opt-level = 3 for all builds.

nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Sep 3, 2022

Lokathor mentioned this issue Sep 3, 2022

How should we expose atomic load/store on targets that don't support full atomics #99668

Open

Lokathor mentioned this issue Sep 4, 2022

Optimize pre-v6 ARM load/store on single-core systems taiki-e/portable-atomic#36

Merged

workingjubilee added A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. A-atomic Area: atomics, barriers, and sync primitives labels Mar 4, 2023

Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc emits calls to `__sync_` when using load/store with `+forced-atomics` #101300

rustc emits calls to `__sync_` when using load/store with `+forced-atomics` #101300

Lokathor commented Sep 2, 2022 •

edited

Loading

thomcc commented Sep 3, 2022

nikic commented Sep 3, 2022

nikic commented Sep 3, 2022

thomcc commented Sep 3, 2022

thomcc commented Sep 3, 2022

nikic commented Sep 3, 2022

Lokathor commented Sep 3, 2022

Lokathor commented Sep 3, 2022

Lokathor commented Sep 3, 2022

taiki-e commented Sep 4, 2022 •

edited

Loading

Lokathor commented Sep 4, 2022

rustc emits calls to __sync_ when using load/store with +forced-atomics #101300

rustc emits calls to __sync_ when using load/store with +forced-atomics #101300

Comments

Lokathor commented Sep 2, 2022 • edited Loading

thomcc commented Sep 3, 2022

nikic commented Sep 3, 2022

nikic commented Sep 3, 2022

thomcc commented Sep 3, 2022

thomcc commented Sep 3, 2022

nikic commented Sep 3, 2022

Lokathor commented Sep 3, 2022

Lokathor commented Sep 3, 2022

Lokathor commented Sep 3, 2022

taiki-e commented Sep 4, 2022 • edited Loading

Lokathor commented Sep 4, 2022

rustc emits calls to `__sync_` when using load/store with `+forced-atomics` #101300

rustc emits calls to `__sync_` when using load/store with `+forced-atomics` #101300

Lokathor commented Sep 2, 2022 •

edited

Loading

taiki-e commented Sep 4, 2022 •

edited

Loading