How to implement atoi using SIMD?
Problem:
I would like to try writing an atoi implementation using SIMD instructions, to be included in RapidJSON. The algorithm I came up with is as follows:
My algorithm is correct? Is there a better way? Is there a reference implementation for atoi using any SIMD instructions set?
Answer:
The algorithm is correct and complete. It works for int and uint, from MIN_INT=-2147483648 to MAX_INT=2147483647 and from MIN_UINT=0 to MAX_UINT=4294967295.
A reference implementation is provided, written in GNU Assembler with intel syntax.
The properties of this code are as follows:
The approach of the algorithm is as follows:
The last step is adding these four DWORDs together with 2PHADDD emulated by 2(PSHUFD PADDD)
The result of Intel-IACA Throughput Analysis for Haswell 32-bit:
Block Throughput: 16.10 Cycles Throughput Bottleneck: InterIteration
N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
F - Macro Fusion with the previous instruction occurred
instruction micro-ops not bound to a port
^ - Micro Fusion happened
@ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
! - instruction not supported, was not accounted in Analysis
| Num Of | Ports pressure in cycles | |
| 0* | | | | | | | | | | xor eax, eax
| 0* | | | | | | | | | | xor ecx, ecx
| 0* | | | | | | | | | | xor edx, edx
| 1 | | 0.1 | | | | | 0.9 |
The above is the detailed content of How to efficiently implement atoi using SIMD instructions?. For more information, please follow other related articles on the PHP Chinese website!