Known as the Intel Architecture Code Analyzer, IACA is an advanced tool for evaluating code scheduling against Intel CPUs. It operates in three modes:
Capabilities and Applications:
Usage:
Instructions for IACA usage vary depending on your programming language.
C/C :
Include the necessary IACA header (iacaMarks.h) and place start and end markers around your target loop:
/* C or C++ Usage */ while(cond){ IACA_START /* Innermost Loop Body */ /* ... */ } IACA_END
Assembly (x86):
Insert the specified magic byte patterns to designate markers manually:
/* NASM Usage */ mov ebx, 111 ; Start marker bytes db 0x64, 0x67, 0x90 ; Start marker bytes .innermostlooplabel: ; Loop body ; ... jne .innermostlooplabel ; Conditional Branch Backwards to Top of Loop mov ebx, 222 ; End marker bytes db 0x64, 0x67, 0x90 ; End marker bytes
Command-Line Invocation:
Invoke IACA from the command line with appropriate parameters, such as:
iaca.sh -64 -arch HSW -graph insndeps.dot foo
This will analyze the 64-bit binary foo on a Haswell CPU, generating an analysis report and a Graphviz visualization.
Output Interpretation:
The output report provides detailed information on the target code's scheduling and bottlenecks. For instance, consider the following Assembly snippet:
.L2: vmovaps ymm1, [rdi+rax] ;L2 vfmadd231ps ymm1, ymm2, [rsi+rax] ;L2 vmovaps [rdx+rax], ymm1 ; S1 add rax, 32 ; ADD jne .L2 ; JMP
By inserting markers around this code and analyzing it, IACA may report (abridged):
Throughput Analysis Report -------------------------- Block Throughput: 1.55 Cycles Throughput Bottleneck: FrontEnd, PORT2_AGU, PORT3_AGU [Port Pressure Breakdown] | Instruction --------------------------|----------------- | | vmovaps ymm1, ymmword ptr [rdi+rax*1] | 0.5 CP | | 1.5 CP | vfmadd231ps ymm1, ymm2, ymmword ptr [rsi+rax*1] | 1.5 CP | vmovaps ymmword ptr [rdx+rax*1], ymm1 | 1 CP | add rax, 0x20 | 0 CP | jnz 0xffffffffffffffec
From this output, IACA identifies the Haswell frontend and Port 2 and 3's AGU as bottlenecks. It suggests that optimizing the store instruction to be processed by Port 7 could improve performance.
Limitations:
IACA has some limitations:
The above is the detailed content of How Does Intel Architecture Code Analyzer (IACA) Help Analyze and Optimize Code Performance for Intel CPUs?. For more information, please follow other related articles on the PHP Chinese website!