How to Efficiently Implement log2(__m256d) in AVX2 without Intel\'s Compiler Dependencies?-C++-php.cn

How to Efficiently Implement log2(__m256d) in AVX2 without Intel\'s Compiler Dependencies?

Patricia Arquette

Release： 2024-12-15 12:03:10

Original

1025 people have browsed it

How to Efficiently Implement log2(__m256d) in AVX2 without Intel's Compiler Dependencies?

Efficient Implementation of log2(__m256d) in AVX2

In the context of AVX2, Intel's __m256d _mm256_log2_pd (__m256d a) function is not compatible with other compilers besides Intel and reportedly exhibits reduced performance on AMD processors. To address this, let's explore an alternative implementation that offers efficiency and wide compatibility.

Strategies for log2 Approximation

Typically, log2(ab) is calculated as log2(a) log2(b). Since a is represented by a 2^exponent mantissa, the calculation simplifies to exponent log2(mantissa). The limited range of mantissa (1.0 to 2.0) allows for a tailored polynomial approximation to calculate log2(mantissa).

Polynomial Approximation

Taylor series expansions are commonly used as starting points for coefficients, but minimax fitting is recommended to minimize error over the target range. For higher precision around values close to 1.0, mantissa-1.0 can be used as the polynomial input, eliminating the need for a constant term.

Accuracy Considerations

The desired accuracy level will influence implementation choices. Higher accuracy typically comes at the cost of speed due to additional computational steps. Agner Fog's VCL library provides highly accurate functions but employs complex techniques that may not be essential for all applications.

VCL Algorithm for log2

VCL's log2 function involves the following steps:

Extracting and converting the exponent bits into a float.
Adjusting the mantissa to [0.5, 1.0) or (0.5, 1.0], followed by subtraction by 1.0.
Applying a polynomial approximation to calculate log(x) around x=1.0, using either a single 5th-order polynomial (double) or a ratio of two 5th-order polynomials (float).
Adding exponent polynomial_approx_log(mantissa) to obtain the final result.

Steps to Improve Accuracy and Speed

To enhance accuracy:

Consider using a more accurate polynomial approximation.
Avoid subtraction by 1.0 (leave as mantissa - 1.0) to reduce potential precision loss.

To optimize speed:

Use truncated polynomial approximations with fewer terms.
Employ vectorized instructions to process multiple values simultaneously.
Eliminate unnecessary checks for special cases (e.g., underflow, overflow, denormal) if input values are known to be finite and positive.

The above is the detailed content of How to Efficiently Implement log2(__m256d) in AVX2 without Intel\'s Compiler Dependencies?. For more information, please follow other related articles on the PHP Chinese website!