Efficient Double/int64 Conversions with SSE/AVX
While SSE2 provides instructions for converting vectors between single-precision floats and 32-bit integers, corresponding intrinsics for double-precision and 64-bit integers are missing. Even AVX does not offer these conversions.
Fallback Techniques
In the absence of dedicated instructions, there are several approaches to simulate these conversions:
- For values in specific ranges, using a shifted add and a bitwise XOR can convert double to uint64_t or int64_t in just two instructions. - Reversing these steps can perform the inverse conversions.
Full Range Conversions:
Implementation Details
The trick for truncated conversions relies on the fact that double-precision floating-point values in the range [2^52, 2^53) have their lowest mantissa bit aligning with the least significant bit. By adding a specific mask value and performing a bitwise operation, the integer representation can be obtained.
The full range conversions address the sign-extension issues and exploit the fact that addition in floating-point on x86 can cancel out fractional bits, enabling the accurate reconstruction of the double-precision result.
Rounding Behavior
The truncated conversion methods follow the current rounding mode, except that round towards zero may round towards negative infinity. The full range conversions ensure correct rounding for all modes.
Availability
The presented techniques provide a workaround for the lack of direct int64_t and double conversions in SSE/AVX. These methods can be particularly useful in optimizing code where these conversions are required, providing a balance between efficiency and accuracy.
The above is the detailed content of How Can I Efficiently Convert Between Double and int64 Using SSE/AVX?. For more information, please follow other related articles on the PHP Chinese website!