Atomic operations on memory need not be executed directly on the RAM, as they can be carried out within the cache as long as all observers perceive them as atomic. The coherence of cache between cores ensures that even DMA operations honour this atomicity.
For aligned loads or stores up to 64 bits, atomicity is achieved "for free" as the operation is complete within the data paths of the system, including between cores, memory, and PCIe. This means the CPU hardware can guarantee the atomicity of the operation without any additional hardware or blocking of other requests.
Modifying data in the L1 cache atomically is sufficient for atomicity, as any other core or DMA access will observe the change as a single operation. This modification can occur later than the initial store due to out-of-order execution.
Performance optimizations aside, accessing data across cache line boundaries can result in non-atomic behaviour. On x86, aligned accesses up to 8 bytes are atomic, implying that whole cache lines (typically 64B) are transferred atomically even if the data paths are narrower. However, wider accesses require a lock to protect against concurrent access.
Atomic read-modify-write operations pose a greater challenge. To maintain atomicity, the core must maintain the cache line in Modified state and block external modifications while the operation is in progress. For unaligned operations, a bus lock may be required to ensure other cores observe the changes as atomic.
The above is the detailed content of How is Atomicity of Loads and Stores Guaranteed on x86 Architectures?. For more information, please follow other related articles on the PHP Chinese website!