Opcode is an intermediate language compiled from PHP scripts, just like Java's ByteCode, or .NET's MSL. For example, if you write the following PHP code:
echo "Hello World";
$a = 1 + 1;
echo $a;
PHP will go through the following 4 steps to execute this code (to be precise, it should be PHP’s language engine Zend)
1. Scanning(Lexing), convert PHP code into language fragments (Tokens)
2. Parsing, converting Tokens into simple and meaningful expressions
3. Compilation, compile expressions into Opocdes
4. Execution, execute Opcodes sequentially, one at a time, thereby realizing the function of PHP script.
Note: Some caches, such as APC, can enable PHP to cache Opcodes. In this way, every time a request comes, there is no need to repeat the first three steps, which can greatly improve the execution speed of PHP.
So what is Lexing? Students who have studied the principles of compilation should have an understanding of the lexical analysis steps in the principles of compilation. Lex is a basis table for lexical analysis. Zend/zend_language_scanner.c will perform lexical analysis on the input PHP code based on Zend/zend_language_scanner.l (Lex file) to obtain "words" one by one. PHP4.2 has provided a function called token_get_all. This function is You can talk about Scanning a piece of PHP code into Tokens;
If you use this function to process the PHP code we mentioned at the beginning, you will get the following results:
Array
(
[0] => Array
(
[0] =>
[1] =>
(
[0] =>
[1] => echo
)
[2] => Array
(
[0] =>
[1] =>
)
[3] => Array
(
[0] =>
[1] =>
)
[4] => ;
[5] => Array
(
[0] =>
[1] =>
)
[6] => =
[7] => Array
(
[0] =>
[1] =>
)
[8] => Array
(
[0] =>
[1] => 1
)
[9] => Array
(
[0] => 370
[1] =>
)
[10] => +
[11] => Array
(
[0] => 370
[1] =>
)
[12] => Array
(
[0] => 305
[1] => 1
)
[13] => ;
[14] => Array
(
[0] => 370
[1] =>
)
[15] => Array
(
[0] => 316
[1] => echo
)
[16] => Array
(
[0] => 370
[1] =>
)
[17] => ;
)
分析这个返回结果我们可以发现,源码中的字符串,字符,空格,都会原样返回。每个源代码中的字符,都会出现在相应的顺序处。而,其他的比如标签,操作符,语句,都会被转换成一个包含俩部分的Array: Token ID (也就是在Zend内部的改Token的对应码,比如,T_ECHO,T_STRING),和源码中的原来的内容。
接下来,就是Parsing阶段了,Parsing首先会丢弃Tokens Array中的多于的空格,然后将剩余的Tokens转换成一个一个的简单的表达式
1. echo a constant string
2. add two numbers together
3. store the result of the prior expression to a variable
4. echo a variable
然后就改Compilation阶段了,它会把Tokens编译成一个个op_array, 每个op_arrayd包含如下5个部分:
1. Opcode数字的标识,指明了每个op_array的操作类型,比如add , echo
2. 结果 存放Opcode结果
3. 操作数1 给Opcode的操作数
4. 操作数2
5. Extension value 1 integer is used to distinguish overloaded operators
For example, our PHP code will be parsed into:
ZEND_ECHO 'Hello World'
ZEND_ADD ~0 1 1
ZEND_ASSIGN !0 ~0
ZEND_ECHO !0
Haha, you may ask, where did our $a go?
This is about introducing the operands. Each operand is composed of the following two parts:
a) op_type: IS_CONST, IS_TMP_VAR, IS_VAR, IS_UNUSED, or IS_CV
b) u, a union, stores the value (const) or lvalue (var) of this operand in different types according to the difference in op_type
As for var, each var is different
IS_TMP_VAR, as the name suggests, this is a temporary variable that saves some results of op_array for use in the next op_array. The u of this type of operand stores a handle (integer) pointing to the variable table. This type of operand is generally used~ The beginning, such as ~0, indicates the unknown temporary variable number 0 in the variable table
IS_VAR is a variable in our general sense. They start with $ to indicate
IS_CV represents a cache mechanism used by compilers after ZE2.1/PHP5.1. This variable stores the address of the variable referenced by it. When a variable is referenced for the first time, it will be CVd. Future references to this variable do not need to look up the active symbol table again, the CV variable starts with! The beginning indicates.
It seems that our $a is optimized to !0 because it is not referenced.
· Author: laruence(http://www.laruence.com/)