BackgroundHHVM is a high-performance PHP virtual machine developed by Facebook. It is claimed to be 9 times faster than the official one. I was very curious, so I took the time to briefly learn about it and compiled this article. I hope it can answer two questions:
What would you do?Before discussing the implementation principles of HHVM, let’s put ourselves in your shoes: Suppose you have a website written in PHP that encounters performance problems. After analysis, you find that a large part of the resources are consumed in PHP. How would you optimize PHP performance? For example, there are several ways:
Option 1 is almost unfeasible. Ten years ago, Joel warned with the example of Netscape that you will give up years of experience accumulation, especially for products with complex business logic like Facebook. There are too many PHP codes. According to the It is said to have 20 million lines (quoted from [PHP on the Metal with HHVM]). The cost of modification is probably greater than writing a virtual machine, and for a team of thousands of people, learning from scratch is unacceptable. Option 2 is the safest solution and can be migrated gradually. In fact, Facebook is also working hard in this regard and has also developed RPC solutions such as Thrift. Another language mainly used within Facebook is C++. From the early days You can see this in the Thrift code, because the implementations in other languages are very crude and cannot be used in a production environment. Currently in Facebook, it is said that PHP:C++ has increased from 9:1 to 7:3. Coupled with the presence of Andrei Alexandrescu, C++ is becoming more and more popular in Facebook, but this can only solve part of the problem. After all, the cost of C++ development Much higher than PHP, it is not suitable for use in places that are frequently modified, and too many RPC calls will seriously affect performance. Option 3 looks good, but is difficult to implement in practice. Generally speaking, the performance bottleneck is not very significant, and is mostly the result of continuous accumulation. In addition, the cost of PHP extension development is high. This solution is generally only used in public applications. And it is based on a basic library that has not changed much, so this solution cannot solve many problems. It can be seen that the first three solutions cannot solve the problem well, so Facebook actually has no choice but to consider the optimization of PHP itself. Faster PHPSince we want to optimize PHP, how to optimize it? In my opinion, there are several methods:
PHP language-level optimization is the simplest and feasible. Of course Facebook has thought of it, and has also developed performance analysis tools like XHProf, which is very helpful in locating performance bottlenecks. However, XHProf still failed to solve Facebook's problem well, so we continue to look at it. Next is option 2. Simply put, the execution process of Zend can be divided into two parts: compiling PHP into opcode and executing opcode, so optimizing Zend It can be considered from these two aspects. Optimizing opcode is a common practice, which can avoid repeated parsing of PHP, and can also do some static compilation optimization, such as Zend Optimizer Plus. However, due to the dynamic nature of the PHP language, this optimization method is limited and optimistic. It is estimated that it can only improve performance by 20%. Another consideration is to optimize the opcode architecture itself, such as a register-based approach, but this approach requires too much work to modify, and the performance improvement will not be particularly obvious (maybe 30%?), so the input-output ratio is not high. Another method is to optimize the execution of opcode. First, let’s briefly mention how Zend executes it. After Zend’s interpreter (also called interpreter) reads the opcode, it will call different functions according to different opcodes (actually some are switches, but for I have simplified the description for convenience), and then perform various language-related operations in this function (if you are interested, you can read the book "In-depth Understanding of the PHP Core"), so there are no complex encapsulation and indirect calls in Zend, as an explanation It's already done very well for the device. If you want to improve the execution performance of Zend, you need to understand the underlying execution of the program. For example, function calls actually have overhead, so they can be optimized through Inline threading. Its principle is like the inline keyword in C language That way, but it expands related functions at runtime and then executes them in sequence (just an analogy, the actual implementation is different), and it also avoids the waste caused by CPU pipeline prediction failure. In addition, you can also use assembly like JavaScriptCore and LuaJIT to implement the interpreter. For specific details, it is recommended to read Mike’s explanation But these two methods are too expensive to modify, and are even more difficult than rewriting one, especially to ensure backward compatibility, as you will know when I mention the characteristics of PHP later. Developing a high-performance virtual machine is not a simple matter. It took more than 10 years for the JVM to reach its current performance. So can these high-performance virtual machines be directly used to optimize the performance of PHP? This is the idea of Option 3. In fact, this solution has been tried by people for a long time, such as Quercus and IBM's P8. Quercus has hardly been used by anyone, and P8 is also dead. Facebook has also investigated this method, and there have even been unreliable rumors, but in fact Facebook gave up in 2011. Because option 3 looks good, but the actual effect is not ideal. According to many experts (such as Mike), VM is always optimized for a certain language, and other languages will encounter many bottlenecks when implementing it, such as dynamic Method calling has been introduced in Dart's documentation, and it is said that the performance of Quercus is not much different from Zend+APC ([from The HipHop Compiler for PHP]), so it doesn't make much sense. However, OpenJDK has also been working hard in recent years. The recent Grall project looks pretty good, and there are also languages that have achieved significant results on it, but I haven’t had time to study Grall yet, so I can’t judge here. The next step is option 4, which is exactly what HPHPc (the predecessor of HHVM) does. The principle is to convert the PHP code into C++ and then compile it into a local file. It can be considered an AOT (ahead of time) method. About it For technical details of code conversion, please refer to the paper The HipHop Compiler for PHP. The following is a screenshot from the paper, which can be used to get an overview: ![]() The biggest advantage of this approach is that it is simple to implement (compared to a VM), and it can do a lot of compilation optimization (because it is offline, it is okay if it is slower), for example, the above example will In addition to HPHPc, there are two similar projects, one is Roadsend and the other is phc. phc’s approach is to convert PHP into C and then compile it. The following is an example of converting <div class="blockcode">
<div id="code_LjU"><ol>
<li>static php_fcall_info fgc_info;</li>
<li>php_fcall_info_init ("file_get_contents", &fgc_info);</li>
<li>php_hash_find (LOCAL_ST, "f", 5863275, &fgc_info.params);</li>
<li>php_call_function (&fgc_info) ;</li>
</ol></div>
<em onclick="copycode($('code_LjU'));">Copy code</em>
</div> Speaking of phc, the author once cried on the blog, saying that he went to Facebook to demonstrate phc two years ago and communicated with the engineers there. As a result, it became popular as soon as it was released, but he has been busy for 4 years but is unknown. Now The future is bleak. . . Roadsend is no longer maintained. For dynamic languages like PHP, this approach has many limitations. Since it cannot be included dynamically, Facebook compiled all the files together. The file deployment when going online actually reached 1G. It's becoming increasingly unacceptable. There is also a project called PHP QB. I didn’t look at it due to time constraints. I think it might be something similar. So there is only one way left, which is to write a faster PHP virtual machine and take this dark road to the end. Maybe you are like me. When you first heard that Facebook was going to build a virtual machine, you thought it was too outrageous, but If you analyze it carefully, you will find that this is actually the only way. Faster virtual machinesWhy is HHVM faster? The key technology of JIT has been mentioned in various news reports, but in fact it is far from that simple. JIT is not a magic wand that can improve performance with just a wave of it, and the operation of JIT itself is also time-consuming. , for simple programs, it may be slower than the interpreter. The most extreme example is that the interpreter of LuaJIT 2 is slightly faster than the JIT of V8, so there is no absolute thing. It is more about the handling of details. The development history of HHVM It is a history of continuous optimization. You can see from the picture below how it surpasses HPHPc little by little: ![]() It is worth mentioning that the new virtual machine ART in Android 4.4 uses the AOT solution (remember? The HPHPc mentioned earlier is this), and the result is twice as fast as the previous Dalvik that used JIT, so JIT is not necessarily faster than AOT. Therefore, this project is very risky. Without a strong heart and perseverance, it is very likely to be abandoned halfway. Google once wanted to use JIT to improve the performance of Python, but it ultimately failed. For Google, the use of Python is actually There are no performance issues (well, Google used to write crawl in Python [see In The Plex], but that was all in 1996). Compared to Google, Facebook obviously has greater motivation and determination. PHP is Facebook’s most important language. Let’s take a look at which experts Facebook has invested in this project (not complete):
Although there are no top experts in the field of virtual machines like Lars Bak and Mike Pall, if these experts can work together and write a virtual machine, it will not be a big problem. So what challenges will they face? Next we discuss them one by one. What are the specifications?The first problem you have to face when writing your own PHP virtual machine is that PHP has no language specification, and the syntax between many versions is incompatible (even small version numbers, such as 5.2.1 and 5.2.3). What is the PHP language specification? What about the definition? Let’s take a look at a statement from IEEE:
So the only way is to honestly look at the implementation of Zend. Fortunately, it has been painfully done once in HPHPc, so HHVM can directly use it, so this problem is not too big. Language or extension?Implementing the PHP language is not just as simple as implementing a virtual machine. The PHP language itself also includes various extensions. These extensions are integrated with the language. Zend works tirelessly to implement various functions that you may use. If you analyze the PHP code, you will find that its C code has 800+ thousand lines after excluding the blank line comments. And guess how many Zend engine parts there are? There are just under 100,000 rows. This is not a bad thing for developers, but it is very tragic for engine implementers. We can compare it with Java. To write a Java virtual machine, you only need to implement bytecode interpretation and some basic JNI calls. Most of Java's built-in libraries are implemented in Java, so if performance optimization is not considered, it is much more difficult to implement a PHP virtual machine than a JVM in terms of workload. For example, someone used 8,000 lines of TypeScript to implement a JVM. Doppio. For this problem, HHVM’s solution is very simple, that is, only implement what is used in Facebook, and you can also use what has been written before in HPHPc, so the problem is not big. Implement InterpreterThe next step is the implementation of Interpreter. After parsing PHP, a Bytecode designed by HHVM will be generated, which is stored in The main body of Interpreter is implemented in bytecode.cpp. For methods such as <code class="c++"><div class="blockcode">
<div id="code_oM7"><ol>
<li>if (c2.m_type == KindOfInt64) return o(c1.m_data.num, c2.m_data.num);</li>
<li>if (c2.m_type == KindOfDouble) return o(c1.m_data.num, c2.m_data.dbl);</li>
</ol></div>
<em onclick="copycode($('code_oM7'));">复制代码</em>
</div> 正是因为有了 Interpreter,HHVM 在对于 PHP 语法的支持上比 HPHPc 有明显改进,理论上做到完全兼容官方 PHP,但仅这么做在性能并不会比 Zend 好多少,由于无法确定变量类型,所以需要加上类似上面的条件判断语句,但这样的代码不利于现代 CPU 的执行优化,另一个问题是数据都是 boxed 的,每次读取都需要通过类似 if (c2.m_type == KindOfInt64) return o(c1.m_data.num, c2.m_data.num); if (c2.m_type == KindOfDouble) return o(c1.m_data.num, c2.m_data.dbl); Copy code
m_data.num and m_data.dbl method to obtain indirectly. Someone experimented with LLVM in 2008, and the result was 21 times slower than the original. . . In 2010, IBM Japan Research Institute developed P9 based on their JVM virtual machine code. Its performance is 2.5 to 9.5 times that of official PHP. You can read their paper Evaluation of a just-in-time compiler retrofitted for PHP.
<div class="blockcode">In 2011, Andrei Homescu developed it based on RPython and wrote a paper HappyJIT: a tracing JIT compiler for PHP, but the test results were mixed and not ideal. <div id="code_JSG">
<ol>
<li>So what exactly is JIT? How to implement a JIT? </li>
<li>
</li>
<li>In dynamic languages, there is basically an eval method, which can be passed a string for execution. JIT does a similar thing, except that it needs to splice not strings, but machine codes on different platforms, and then to execute, but how to implement it in C? You can refer to this introductory example written by Eli. Here is a piece of code from the article: </li>
<li>
</li>
<li>
<li>
</ol>
</div>
<em onclick="copycode($('code_JSG'));">unsigned char code[] = {</em> 0x48, 0x89, 0xf8, // mov %rdi, %rax</div> 0x48, 0x83, 0xc0, 0x04, // add $4, %rax🎜 0xc3 // ret🎜} ;🎜memcpy(m, code, sizeof(code));🎜🎜🎜Copy code🎜🎜 However, it is easy to make mistakes when writing machine code by hand, so the best is to have an auxiliary library, such as Mozilla's Nanojit and LuaJIT's DynASM, but HHVM does not use these, but implements one that only supports x64 (in addition Still trying to use VIXL to generate ARM 64-bit) and make the code executable through mprotect. But why is JIT code faster? You can think about it. In fact, the code written in C++ is eventually compiled into machine code. If the same code is just manually converted into machine code, what is the difference between it and what is generated by GCC? Although we mentioned some optimization techniques based on CPU implementation principles earlier, the more important optimization in JIT is to generate specific instructions based on types, thereby greatly reducing the number of instructions and conditional judgments. The following picture from TraceMonkey shows this A very intuitive comparison was made. We will see specific examples in HHVM later: ![]() HHVM is first executed through the interpeter, then when will it use JIT? There are 2 common JIT trigger conditions:
As to which of the two methods is better, there is a post on Lambada that has attracted discussions from various experts, especially Mike Pall (LuaJIT author), Andreas Gal (Mozilla VP) and Brendan Eich (Mozilla CTO). I have a lot of my own opinions, and I recommend everyone to watch them, so I won’t show off here. The difference between them is not only the compilation scope, but also many details, such as the handling of local variables, which will not be discussed here But HHVM did not use these two methods. Instead, it created its own method called tracelet, which is divided according to type. See the picture below ![]() You can see that it divides a function into 3 parts. The upper 2 parts are used to handle two different situations where Of course, various attempts and optimizations are needed to achieve high-performance JIT. For example, initially the newly added tracelet of HHVM will be placed in the front, that is, the positions of A and C in the above picture will be swapped. Later, I tried to put it in the back. As a result, the performance was improved by 14%, because the test found that it is easier to hit the response type in advance The execution process of JIT is to first convert HHBC to SSA (hhbc-translator.cpp), then optimize SSA (such as Copy propagation), and regenerate it into local machine code. For example, under X64, it is implemented by translator-x64.cpp of. Let’s use a simple example to see what the machine code finally generated by HHVM is like, such as the following PHP function: <div class="blockcode">
<div id="code_B9S">
<ol>
<li><?php<li>function a($b){<li> echo $b + 2;<li>}</ol></div><em onclick="copycode($('code_B9S'));">复制代码</em></div>
<p><div id="code_B9S"></p><code class="nasm language-nasm" data-lang="nasm"><?php<div class="blockcode">function a($b){<div id="code_ZLy"> echo $b + 2;<ol>} <li><li><em onclick="copycode($('code_B9S'));">Copy the code<li><li><li> <li>This is what it looks like after compilation:<li>
<li><li><li><li><li>mov rcx,0x7200000<li>mov rdi,rbp<li>mov rsi,rbx<li>mov rdx,0x20<li>call 0x2651dfb <HPHP::Transl::traceCallback(HPHP::ActHP::TypedValue*, *, long , void*)></li>
<li>cmp BYTE PTR [rbp-0x8],0xa</li>
<li>jne 0xae00306</li>
<li>; The previous step is to check whether the parameters are valid</li>
<li>
<li>mov rcx,QWORD PTR [rbp-0x10]; Here %rcx is assigned a value of 1 </li>
<li>mov edi,0x2 ; Assign %edi (that is, the lower 32 bits of %rdi) to 2 </li>
<li>add rdi,rcx ; Add %rcx </li>
<li>call 0x2131f1b <HPHP::print_int(long)> ; Call the print_int function , at this time the value of the first parameter %rdi is already 3</li>
<li>
<li>; We will not discuss it later</li>
</ol>mov BYTE PTR [rbp+0x28],0x8</div>lea rbx,[rbp+0x20]<em onclick="copycode($('code_ZLy'));">test BYTE PTR [r12], 0xffnjne 0xae0032A</em>Push QWORD PTR [RBP+0x8] </div>mov RBP, QWORD PTR [RBP+0x0] 🎜mov RDI, RBP🎜mov RSI, RBX🎜mov RDX, QWORD PTR [RSP] 🎜 0x236b70e &E lt; hphp: :JIT::traceRet(HPHP::ActRec*, HPHP::TypedValue*, void*)>🎜ret 🎜🎜🎜Copy code🎜🎜 And the implementation of HPHP::print_int function is like this: <div class="blockcode"><div id="code_K6f"><ol>
<code class="c++ language-c++" data-lang="c++"><div class="blockcode">
<div id="code_K6f"><ol>
<li>void print_int(int64_t i) {</li>
<li> char buf[256];</li>
<li> snprintf(buf, 256, "%" PRId64, i);</li>
<li> echo(buf);</li>
<li> TRACE(1, "t-x64 output(int): %" PRId64 "n", i);</li>
<li>}</li>
</ol></div>
<em onclick="copycode($('code_K6f'));">复制代码</em>
</div> 可以看到 HHVM 编译出来的代码直接使用了 snprintf(buf, 256, "%" PRId64, i); echo(buf);<div class="blockcode">
<div id="code_K70"><ol><li>-v Eval.JitWarmupRequests=0</li></ol></div>
<em onclick="copycode($('code_K70'));">复制代码</em>
</div> TRACE(1, "t-x64 output(int): %" PRId64 "n", i);} Copy codeYou can see that the code compiled by HHVM directly uses <div class="blockcode">
<div id="code_biL"><ol>
<li><?hh<li>class Point2 {<li> public float $x, $y;<li> function __construct(float $x, float $y) {<li> $this->x = $x;</li>
<li> $this->y = $y;</li>
<li> }</li>
<li>}</li>
<li>//来自:https://raw.github.com/strangeloop/StrangeLoop2013/master/slides/sessions/Adams-TakingPHPSeriously.pdf</li>
</ol></div>
<em onclick="copycode($('code_biL'));">复制代码</em>
</div> 注意到
| Added in January 2014: The current promotion momentum of HHVM in the factory is very good. It is recommended that everyone try it in 2014, especially now that the compatibility test has reached 98.58%, and the modification cost has been further reduced.