Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pre-v6 ARM load/store on single-core systems #36

Merged
merged 1 commit into from
Sep 4, 2022
Merged

Conversation

taiki-e
Copy link
Owner

@taiki-e taiki-e commented Sep 4, 2022

Since pre-v61 ARM has no Data Memory Barrier and cannot implement atomic in a way compatible with v6 and later multicore CPUs without OS helpers, LLVM generates libcalls for atomic operations that require synchronization, such as Acquire/Release and SeqCst.

Currently, (when portable_atomic_unsafe_assume_single_core cfg is used) we consider this as "atomic load/store are not available on pre-v6 ARM" and always disable interrupts. However, relaxed load+compiler fence can do this more efficiently since we know it is single-core.

// On pre-v6 ARM, we cannot use core::sync::atomic here because they call the
// `__sync_*` builtins for non-relaxed loads and stores.

// On some platforms, atomic load/store can be implemented in a more efficient
// way than disabling interrupts.
//
// Note: On single-core systems, it is okay to use critical session-based
// CAS together with atomic load/store. The load/store will not be
// called while interrupts are disabled, and since the load/store is
// atomic, it is not affected by interrupts even if interrupts are enabled.

FYI @Lokathor

Footnotes

  1. ARMv6 except for ARMv6-M also does not have the DMB instruction, but there is a special instruction equivalent to DMB.

@Lokathor
Copy link

Lokathor commented Sep 4, 2022

Yeah this basically sounds like what i was fighting with earlier today.

rust-lang/rust#101300

@taiki-e
Copy link
Owner Author

taiki-e commented Sep 4, 2022

Hmm, in release mode it is fine, but in debug mode it fails to compile due to the compiler cannot remove dead code: https://github.com/taiki-e/portable-atomic/runs/8173770404?check_suite_focus=true

I should use inline asm for load/store as well like #18.

@taiki-e taiki-e force-pushed the armv4t branch 2 times, most recently from 77f1bf4 to a958639 Compare September 4, 2022 04:17
@taiki-e taiki-e added the O-arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Sep 4, 2022
@taiki-e
Copy link
Owner Author

taiki-e commented Sep 4, 2022

bors r+

bors bot added a commit that referenced this pull request Sep 4, 2022
36: Optimize pre-v6 ARM load/store on single-core systems r=taiki-e a=taiki-e

Since pre-v6[^v6] ARM has no Data Memory Barrier and cannot implement atomic in a way compatible with v6 and later multicore CPUs without OS helpers, LLVM generates libcalls for atomic operations that require synchronization, such as Acquire/Release and SeqCst.

Currently, (when `portable_atomic_unsafe_assume_single_core` cfg is used) we consider this as "atomic load/store are not available on pre-v6 ARM" and always disable interrupts. However, relaxed load+compiler fence can do this more efficiently since we know it is single-core.

https://github.com/taiki-e/portable-atomic/blob/e6e1500f440e6a79d87f4471924c5e6613594b51/src/imp/interrupt/mod.rs#L34-L35

https://github.com/taiki-e/portable-atomic/blob/e6e1500f440e6a79d87f4471924c5e6613594b51/src/imp/interrupt/mod.rs#L23-L29

FYI `@Lokathor` 

[^v6]: ARMv6 except for ARMv6-M also does not have the DMB instruction, but there is [a special instruction equivalent to DMB](https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/register-descriptions/c7--cache-operations-register?lang=en).

Co-authored-by: Taiki Endo <te316e89@gmail.com>
@bors
Copy link
Contributor

bors bot commented Sep 4, 2022

Build failed:

@taiki-e
Copy link
Owner Author

taiki-e commented Sep 4, 2022

bors retry

bors bot added a commit that referenced this pull request Sep 4, 2022
36: Optimize pre-v6 ARM load/store on single-core systems r=taiki-e a=taiki-e

Since pre-v6[^v6] ARM has no Data Memory Barrier and cannot implement atomic in a way compatible with v6 and later multicore CPUs without OS helpers, LLVM generates libcalls for atomic operations that require synchronization, such as Acquire/Release and SeqCst.

Currently, (when `portable_atomic_unsafe_assume_single_core` cfg is used) we consider this as "atomic load/store are not available on pre-v6 ARM" and always disable interrupts. However, relaxed load+compiler fence can do this more efficiently since we know it is single-core.

https://github.com/taiki-e/portable-atomic/blob/e6e1500f440e6a79d87f4471924c5e6613594b51/src/imp/interrupt/mod.rs#L34-L35

https://github.com/taiki-e/portable-atomic/blob/e6e1500f440e6a79d87f4471924c5e6613594b51/src/imp/interrupt/mod.rs#L23-L29

FYI `@Lokathor` 

[^v6]: ARMv6 except for ARMv6-M also does not have the DMB instruction, but there is [a special instruction equivalent to DMB](https://developer.arm.com/documentation/ddi0360/e/control-coprocessor-cp15/register-descriptions/c7--cache-operations-register?lang=en).

Co-authored-by: Taiki Endo <te316e89@gmail.com>
@bors
Copy link
Contributor

bors bot commented Sep 4, 2022

Build failed:

@taiki-e
Copy link
Owner Author

taiki-e commented Sep 4, 2022

bors retry

@bors
Copy link
Contributor

bors bot commented Sep 4, 2022

@bors bors bot merged commit 97cfd3a into main Sep 4, 2022
@bors bors bot deleted the armv4t branch September 4, 2022 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants