When Should You Use _mm_sfence, _mm_lfence, or _mm

When Should You Use _mm_sfence, _mm_lfence, or _mm_mfence?

Patricia Arquette

Release： 2024-11-18 00:05:02

Original

529 people have browsed it

When Should You Use _mm_sfence, _mm_lfence, or _mm_mfence?

Deciding When to Employ _mm_sfence, _mm_lfence, or _mm_mfence

When utilizing multi-threaded code, the necessity arises to control memory ordering effectively. While x86 processors boast a strongly-ordered memory model, C and C adhere to a more relaxed model. This can lead to confusion regarding appropriate usage of the intrinsics _mm_sfence, _mm_lfence, and _mm_mfence.

Understanding Memory Ordering

For acquire/release semantics, preventing compile-time reordering alone suffices, as highlighted by the concept of compiler-barriers. This ensures proper ordering of operations in the abstract machine, without hindering performance with unnecessary asm instructions. Options like GNU C/C asm("" ::: "memory") effectively act as compiler barriers, achieving this goal while minimizing performance impact.

Alternatively, C 11 std::atomic offers a seamless solution with shared_var.store(tmp, std::memory_order_release), ensuring global visibility of changes. _mm_mfence holds potential value if you're implementing your own version of C11/C 11 std::atomic, utilizing mfence to establish sequential consistency and prevent subsequent loads from fetching values before preceding stores become globally accessible.

Examining the Roles of Each Intrinsic

_mm_sfence:

Ideal for NT stores, requiring a fence before setting a flag that other threads rely on.
Ensures release/acquire synchronization when using NT stores, which are weakly-ordered unlike regular stores.

_mm_lfence:

Rarely used as a load fence, as loads are only weakly ordered when involving WC (Write-Combining) memory regions.
May be utilized to control execution flow on some processors, preventing later instructions from executing until lfence has completed.

_mm_mfence:

Potentially useful for implementing your own version of std::atomic, leveraging mfence for sequential consistency.
Note: mfence can be slower than locked atomic-RMW operations.

Cautionary Notes Regarding Performance

It's important to acknowledge that fences do not accelerate the visibility of stores. They merely postpone operations within the current thread until preceding operations have been completed.

Conclusion

For general use cases, C 11 std::atomic or C11 stdatomic offer robust and user-friendly solutions for controlling memory ordering. In scenarios involving NT stores or custom implementations of std::atomic, _mm_sfence and _mm_mfence may prove valuable, but careful consideration of their impact on performance is crucial.

The above is the detailed content of When Should You Use _mm_sfence, _mm_lfence, or _mm_mfence?. For more information, please follow other related articles on the PHP Chinese website!