Understand what a PHP7 virtual machine is
Most of the content of this article is translated from Getting into the Zend Execution engine (PHP 5), with some adjustments made. The original text is based on PHP 5, and this article is based on PHP 7.
PHP: An interpreted language
PHP is called a scripting language or an interpreted language. Why? The PHP language is not directly compiled into machine instructions, but is compiled into a form of intermediate code. Obviously it cannot be executed directly on the CPU. Therefore, the execution of PHP needs to be on a process-level virtual machine (see Process virtual machines in Virtual machine, hereinafter referred to as virtual machine).
PHP language, including other interpreted languages, is actually a cross-platform program designed to execute abstract instructions. PHP is mainly used to solve problems related to WEB development.
Programs written in programming languages such as Java, Python, C#, Ruby, Pascal, Lua, Perl, Javascript, etc. all need to be executed on a virtual machine. The virtual machine can compile some virtual machine instructions into machine instructions through JIT compilation technology to improve performance. Brother Niao is already developing PHP to add JIT support.
Recommended tutorial: "PHP Tutorial"
Advantages of using interpreted languages:
The code is simple to write and can be quickly Development
Automatic memory management
Abstract data types, high program portability
Disadvantages:
Unable to directly manage memory and use process resources
Slower than languages compiled into machine instructions: usually requires more more CPU cycles to complete the same task (JIT tries to close the gap, but can never completely eliminate it)
Abstracts so much that when something goes wrong with a program, many programmers It is difficult to explain the root cause
The last shortcoming is the reason why the author wrote this article. The author feels that programmers should understand some underlying things.
The author hopes to explain to readers how PHP operates through this article. The knowledge about the PHP virtual machine mentioned in this article can also be applied to other interpreted languages. Usually, the biggest differences in different virtual machine implementations are: whether to use JIT, parallel virtual machine instructions (generally using multi-threading, PHP does not use this technology), memory management/garbage collection algorithm.
The Zend virtual machine is divided into two parts:
Compilation: Convert PHP code into virtual machine instructions (OPCode)
Execution: Execute the generated virtual machine instructions
This article will not involve the compilation part, but mainly focuses on the execution engine of the Zend virtual machine. The execution engine of the PHP7 version has been partially restructured, making the execution stack of PHP code simpler and clearer, and its performance has also been improved.
This article uses PHP 7.0.7 as an example.
OPCode
Wikipedia’s explanation of OPCode:
Opcodes can also be found in so-called byte codes and other representations intended for a software interpreter rather than a hardware device. These software based instruction sets often employ slightly higher-level data types and operations than Most hardware counterparts, but nevertheless are constructed along similar lines.
OPCode and ByteCode are conceptually different.
My personal understanding: OPCode serves as an instruction to indicate what to do, while ByteCode consists of a sequence of OPCode/data to indicate what to do. Taking an addition as an example, OPCode tells the execution engine to add parameter 1 and parameter 2, while ByteCode tells the execution engine to add 45 and 56.
Reference: Difference between Opcode and Bytecode and Difference between: Opcode, byte code, mnemonics, machine code and assembly
In PHP, Zend/zend_vm_opcodes.h
Source code The file lists all supported OPCodes. Usually, the name of each OPCode describes its meaning, such as:
ZEND_ADD: performs an addition operation on two operands
ZEND_NEW : Create an object
ZEND_FETCH_DIM_R: Read the value of a certain dimension in the operand, for example, when executing the
echo $foo[0]
statement, you need to obtain $foo The value with array index 0
OPCode is represented by the zend_op structure:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Each OPcode is executed in the same way: OPCode has its corresponding C function, When executing this C function, 0, 1 or 2 operands (op1, op2) may be used, and finally the result is stored in result, and some additional information may be stored in extended_value.
Look at what the OPCode of ZEND_ADD looks like, in the Zend/zend_vm_def.h
source code file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
可以看出这其实不是一个合法的C代码,可以把它看成代码模板。稍微解读下这个代码模板:1 就是在Zend/zend_vm_opcodes.h
中define定义的ZEND_ADD的值;ZEND_ADD接收两个操作数,如果两个操作数都为IS_LONG类型,那么就调用fast_long_add_function(该函数内部使用汇编实现加法操作);如果两个操作数,都为IS_DOUBLE类型或者1个是IS_DOUBLE类型,另1个是IS_LONG类型,那么就直接执行double的加法操作;如果存在1个操作数不是IS_LONG或IS_DOUBLE类型,那么就调用add_function(比如两个数组做加法操作);最后检查是否有异常接着执行下一条OPCode。
在Zend/zend_vm_def.h
源码文件中的内容其实是OPCode的代码模板,在该源文件的开头处可以看到这样一段注释:
1 2 3 4 |
|
说明zend_vm_execute.h和zend_vm_opcodes.h,实际上包括zend_vm_opcodes.c中的C代码正是从Zend/zend_vm_def.h的代码模板生成的。
操作数类型
每个OPCode最多使用两个操作数:op1和op2。每个操作数代表着OPCode的“形参”。例如ZEND_ASSIGN OPCode将op2的值赋值给op1代表的PHP变量,而其result则没有使用到。
操作数的类型(与PHP变量的类型不同)决定了其含义以及使用方式:
IS_CV:Compiled Variable,说明该操作数是一个PHP变量
IS_TMP_VAR :虚拟机使用的临时内部PHP变量,不能够在不同OPCode中复用(复用的这一点我并不清楚,还没去研究过)
IS_VAR:虚拟机使用的内部PHP变量,能够在不同OPCode中复用(复用的这一点我并不清楚,还没去研究过)
IS_CONST:代表一个常量值
IS_UNUSED:该操作数没有任何意义,忽略该操作数
操作数的类型对性能优化和内存管理很重要。当一个OPCode的Handler需要读写操作数时,会根据操作数的类型通过不同的方式读写。
以加法例子,说明操作数类型:
1 2 3 4 |
|
OPCode Handler
我们已经知道每个OPCode Handler最多接收2个操作数,并且会根据操作数的类型读写操作数的值。如果在Handler中,通过switch判断类型,然后再读写操作数的值,那么对性能会有很大损耗,因为存在太多的分支判断了(Why is it good to avoid instruction branching where possible?),如下面的伪代码所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
要知道OPCode Handler在PHP执行过程中是会被调用成千上万次的,所以在Handler中对op1、op2做类型判断,对性能并不好。
重新看下ZEND_ADD的代码模板:
1 |
|
这说明ZEND_ADD接收op1和op2为CONST或TMPVAR或CV类型的操作数。
前面已经提到zend_vm_execute.h和zend_vm_opcodes.h中的C代码是从Zend/zend_vm_def.h的代码模板生成的。通过查看zend_vm_execute.h,可以看到每个OPCode对应的Handler(C函数),大部分OPCode会对应多个Handler。以ZEND_ADD为例:
1 2 3 4 5 6 7 8 9 |
|
ZEND_ADD的op1和op2的类型都有3种,所以一共生成了9个Handler,每个Handler的命名规范:ZEND_{OPCODE-NAME}_SPEC_{OP1-TYPE}_{OP2-TYPE}_HANDLER()
。在编译阶段,操作数的类型是已知的,也就确定了每个编译出来的OPCode对应的Handler了。
那么这些Handler之间有什么不同呢?最大的不同应该就是获取操作数的方式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
OPArray
OPArray是指一个包含许多要被顺序执行的OPCode的数组,如下图:
OPArray由结构体_zend_op_array表示:
1 2 3 4 5 6 7 8 9 |
|
在PHP中,每个PHP用户函数或者PHP脚本、传递给eval()的参数,会被编译为一个OPArray。
OPArray中包含了许多静态的信息,能够帮助执行引擎更高效地执行PHP代码。部分重要的信息如下:
当前脚本的文件名,OPArray对应的PHP代码在脚本中起始和终止的行号
/**的代码注释信息
refcount引用计数,OPArray是可共享的
try-catch-finally的跳转信息
break-continue的跳转信息
当前作用域所有PHP变量的名称
函数中用到的静态变量
literals(字面量),编译阶段已知的值,例如字符串“foo”,或者整数42
运行时缓存槽,引擎会缓存一些后续执行需要用到的东西
一个简单的例子:
1 2 3 |
|
OPArray中的部分成员其内容如下:
OPArray包含的信息越多,即在编译期间尽量的将已知的信息计算好存储到OPArray中,执行引擎就能够更高效地执行。我们可以看到每个字面量都已经被编译为zval并存储到literals数组中(你可能发现这里多了一个整型值1,其实这是用于ZEND_RETURN OPCode的,PHP文件的OPArray默认会返回1,但函数的OPArray默认返回null)。OPArray所使用到的PHP变量的名字信息也被编译为zend_string存储到vars数组中,编译后的OPCode则存储到opcodes数组中。
OPCode的执行
OPCode的执行是通过一个while循环去做的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
那么是如何切换到下一个OPCode去执行的呢?每个OPCode的Handler中都会调用到一个宏:
1 2 3 4 5 6 7 8 |
|
该宏会把当前的opline+skip(skip通常是1),将opline指向下一条OPCode。opline是一个全局变量,指向当前执行的OPCode。
额外的一些东西
编译器优化
在Zend/zend_vm_execute.h
中,会看到如下奇怪的代码:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
你可能会对if (IS_CONST == IS_UNUSED)
和#if 0 || (IS_CONST != IS_UNUSED)
感到奇怪。看下其对应的模板代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
php zend_vm_gen.php
在生成zend_vm_execute.h
时,会把OP1_TYPE替换为op1的类型,从而生成这样子的代码:if (IS_CONST == IS_UNUSED)
,但C编译器会把这些代码优化掉。
自定义Zend执行引擎的生成
zend_vm_gen.php
支持传入参数--without-specializer
,当使用该参数时,每个OPCode只会生成一个与之对应的Handler,该Handler中会对操作数做类型判断,然后再对操作数进行读写。
另一个参数是--with-vm-kind=CALL|SWITCH|GOTO
,CALL是默认参数。
前面已提到执行引擎是通过一个while循环执行OPCode,每个OPCode中将opline增加1(通常情况下),然后回到while循环中,继续执行下一个OPCode,直到遇到ZEND_RETURN。
如果使用GOTO执行策略:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
这里的goto并没有直接使用符号名,其实是goto一个特殊的用法:Labels as Values。
执行引擎中的跳转
当PHP脚本中出现if语句时,是如何跳转到相应的OPCode然后继续执行的?看下面简单的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
当$a != 9
时,JMPZ会使当前执行跳转到第5个OPCode,否则JMP会使当前执行跳转到第6个OPCode。其实就是对当前的opline赋值为跳转目标OPCode的地址。
一些性能Tips
这部分内容将展示如何通过查看生成的OPCode优化PHP代码。
echo a concatenation
示例代码:
1 2 3 4 |
|
OPArray:
1 2 3 4 5 6 7 8 9 |
|
$a和$b的值会被ZEND_CONCAT连接后存储到一个临时变量~4中,然后再echo输出。
CONCAT操作需要分配一块临时的内存,然后做内存拷贝,echo输出后,又要回收这块临时内存。如果把代码改为如下可消除CONCAT:
1 2 3 4 |
|
OPArray:
1 2 3 4 5 6 7 8 9 |
|
define()和const
PHP 5.3引入了const关键字。
简单地说:
define()是一个函数调用
conast是关键字,不会产生函数调用,要比define()轻量许多
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
如果使用const:
1 2 3 4 5 6 7 8 9 10 11 |
|
然而const在使用上有一些限制:
const关键字定义常量必须处于最顶端的作用区域,这就意味着不能在函数内,循环内以及if语句之内用const 来定义常量
const的操作数必须为IS_CONST类型
动态函数调用
尽量不要使用动态的函数名去调用函数:
1 2 3 4 5 6 7 8 9 10 11 |
|
NOP表示不做任何操作,只是将当前opline指向下一条OPCode,编译器产生这条指令是由于历史原因。为何到PHP7还不移除它呢= =
看看使用动态的函数名去调用函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
不同点在于INIT_FCALL和INIT_DYNAMIC_CALL,看下两个函数的源码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
很显然INIT_FCALL相比INIT_DYNAMIC_CALL要轻量许多。
类的延迟绑定
简单地说,类A继承类B,类B最好先于类A被定义。
1 2 3 4 5 6 7 8 9 10 11 |
|
从生成的OPCode可以看出,上述PHP代码在运行时,执行引擎不需要做任何操作。类的定义是比较耗性能的工作,例如解析类的继承关系,将父类的方法/属性添加进来,但编译器已经做完了这些繁重的工作。
如果类A先于类B被定义:
1 2 3 4 5 6 7 8 9 10 11 |
|
这里定义了Foo继承自Bar,但当编译器读取到Foo的定义时,编译器并不知道任何关于Bar的情况,所以编译器就生成相应的OPCode,使其定义延迟到执行时。在一些其他的动态类型的语言中,可能会产生错误:Parse error : class not found
。
除了类的延迟绑定,像接口、traits都存在延迟绑定耗性能的问题。
对于定位PHP性能问题,通常都是先用xhprof或xdebug profile进行定位,需要通过查看OPCode定位性能问题的场景还是比较少的。
总结
希望通过这篇文章,能让你了解到PHP虚拟机大致是如何工作的。具体opcode的执行,以及函数调用涉及到的上下文切换,有许多细节性的东西,限于本文篇幅,在另一篇文章:PHP 7 中函数调用的实现进行讲解。
推荐相关文章:《linux系统教程》
The above is the detailed content of Understand what a PHP7 virtual machine is. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics





PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.
