Deciding When to Employ _mm_sfence, _mm_lfence, or _mm_mfence
When utilizing multi-threaded code, the necessity arises to control memory ordering effectively. While x86 processors boast a strongly-ordered memory model, C and C adhere to a more relaxed model. This can lead to confusion regarding appropriate usage of the intrinsics _mm_sfence, _mm_lfence, and _mm_mfence.
Understanding Memory Ordering
For acquire/release semantics, preventing compile-time reordering alone suffices, as highlighted by the concept of compiler-barriers. This ensures proper ordering of operations in the abstract machine, without hindering performance with unnecessary asm instructions. Options like GNU C/C asm("" ::: "memory") effectively act as compiler barriers, achieving this goal while minimizing performance impact.
Alternatively, C 11 std::atomic offers a seamless solution with shared_var.store(tmp, std::memory_order_release), ensuring global visibility of changes. _mm_mfence holds potential value if you're implementing your own version of C11/C 11 std::atomic, utilizing mfence to establish sequential consistency and prevent subsequent loads from fetching values before preceding stores become globally accessible.
Examining the Roles of Each Intrinsic
_mm_sfence:
_mm_lfence:
_mm_mfence:
Cautionary Notes Regarding Performance
It's important to acknowledge that fences do not accelerate the visibility of stores. They merely postpone operations within the current thread until preceding operations have been completed.
Conclusion
For general use cases, C 11 std::atomic or C11 stdatomic offer robust and user-friendly solutions for controlling memory ordering. In scenarios involving NT stores or custom implementations of std::atomic, _mm_sfence and _mm_mfence may prove valuable, but careful consideration of their impact on performance is crucial.
The above is the detailed content of When Should You Use _mm_sfence, _mm_lfence, or _mm_mfence?. For more information, please follow other related articles on the PHP Chinese website!