Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 3.86 KB

README.md

File metadata and controls

36 lines (23 loc) · 3.86 KB

128-bit atomic implementations on 64-bit architectures

(See the atomic64 module for 64-bit atomic implementations on 32-bit architectures.)

128-bit atomic instructions

Here is the table of targets that support 128-bit atomics and the instructions used:

target_arch load store CAS RMW note
x86_64 cmpxchg16b or vmovdqa cmpxchg16b or vmovdqa cmpxchg16b cmpxchg16b cmpxchg16b target feature required. vmovdqa requires Intel, AMD, or Zhaoxin CPU with AVX.
Both compile-time and run-time detection are supported for cmpxchg16b. vmovdqa is currently run-time detection only.
Requires rustc 1.59+
aarch64/arm64ec ldxp/stxp or casp or ldp/ldiapp ldxp/stxp or casp or stp/stilp/swpp ldxp/stxp or casp ldxp/stxp or casp/swpp/ldclrp/ldsetp casp requires lse target feature, ldp/stp requires lse2 target feature, ldiapp/stilp requires lse2 and rcpc3 target features, swpp/ldclrp/ldsetp requires lse128 target feature.
Both compile-time and run-time detection are supported.
Requires rustc 1.59+ (aarch64) / nightly (arm64ec)
riscv64 amocas.q amocas.q amocas.q amocas.q Experimental because LLVM marking the corresponding target feature as experimental. Requires experimental-zacas target feature. Both compile-time and run-time detection are supported (run-time detection is currently disabled by default).
Requires 1.82+ (LLVM 19+)
powerpc64 lq stq lqarx/stqcx. lqarx/stqcx. Requires target-cpu pwr8+ (powerpc64le is pwr8 by default). Both compile-time and run-time detection are supported (run-time detection is currently disabled by default).
Requires nightly
s390x lpq stpq cdsg cdsg Requires nightly

On compiler versions or platforms where these are not supported, the fallback implementation is used.

See aarch64.rs module-level comments for more details on the instructions used on AArch64.

Comparison with core::intrinsics::atomic_* (core::sync::atomic::Atomic{I,U}128)

This directory has target-specific implementations with inline assembly (x86_64.rs, aarch64.rs, riscv64.rs, powerpc64.rs, s390x.rs) and an implementation without inline assembly (intrinsics.rs). The latter currently always needs nightly compilers and is only used for Miri and ThreadSanitizer, which do not support inline assembly.

Implementations with inline assembly generate assemblies almost equivalent to the core::intrinsics::atomic_* (used in core::sync::atomic::Atomic{I,U}128) for many operations, but some operations may or may not generate more efficient code. For example:

  • On x86_64 and AArch64, implementation with inline assembly contains additional optimizations (e.g., #16, #126) and is much faster for some operations.
  • On AArch64, implementation with inline assembly supports outline-atomics on more operating systems, and may be faster in environments where outline-atomics can improve performance.
  • On powerpc64, LLVM does not support generating some 128-bit atomic operations (see intrinsics.rs module-level comments), and we use CAS loop to implement them, so implementation with inline assembly may be faster for those operations.
  • In implementations without inline assembly, the compiler may reuse condition flags that have changed as a result of the operation, or use immediate values instead of registers, depending on the situation.

As 128-bit atomics-related APIs stabilize in the standard library, implementations with inline assembly are planned to be updated to get the benefits of both.

Run-time CPU feature detection

See the detect module's readme for run-time CPU feature detection.