Loading 8 Chars from Memory into an __m256 Variable as Packed Single Precision Floats
In an effort to optimize an algorithm for Gaussian blur, you seek to replace the usage of a float buffer with an __m256 intrinsic variable. This question aims to determine the optimal instructions for this task.
Instruction for AVX2 Architecture:
; rsi = new_image VPMOVZXBD ymm0, [rsi] ; or SX to sign-extend (Byte to DWord) VCVTDQ2PS ymm0, ymm0 ; convert to packed foat
Additional Strategies:
Instructions for AVX1 Architecture:
Perform the following steps:
VPMOVZXBD xmm0, [rsi] VPMOVZXBD xmm1, [rsi+4] VINSERTF128 ymm0, ymm0, xmm1, 1 ; put the 2nd load of data into the high128 of ymm0 VCVTDQ2PS ymm0, ymm0 ; convert to packed float
Intrinsics Considerations:
The above is the detailed content of How to Load 8 Chars into an __m256 Variable as Packed Single Precision Floats?. For more information, please follow other related articles on the PHP Chinese website!