As a simple and powerful language, PHP can provide many language features suitable for the Web. Starting from practice, following the exploration of the principle of weakly typed variables, this article continues to lead you to an in-depth understanding of the PHP kernel.
Recently, when I was communicating with a netizen, I was asked a very strange question. That is, after adding a reference in an operation, the performance is found to be 10,000 times slower. In my mind, references are a very error-prone problem, especially references in PHP, which have many traps. Because I have specifically studied this piece of PHP source code before, I can clearly analyze what the reference is all about. I hope that after reading this article, I can fully understand this issue. If you have any questions or have some questions you want to know, you can leave me a message.
Let’s look at a piece of code first:
class RefferTest { private $data; private $testKey; function __construct() { $key = "hello"; $this->data[$key] = range(0, 10000); $this->testKey = $key; } function reffer($key) { $reffer = &$this->data[$key]; return count($reffer); } function noreffer($key) { return count($this->data[$key]); } function test() { $t1 = microtime(true); for ($i = 0; $i < 5000; $i++) { $this->reffer($this->testKey); } $t2 = microtime(true) - $t1; var_dump("reffer: " . round($t2, 4)); $t1 = microtime(true); for ($i = 0; $i < 5000; $i++) { $this->noreffer($this->testKey); } $t2 = microtime(true) - $t1; var_dump("noreffer: " . round($t2, 4)); } } $test = new RefferTest(); $test->test();
If you finish this code and can tell that the performance difference between reffer and noreffer will be 10,000 times, then there is no need to read the following. This blog is aimed at those new to PHP. You can try this code by running it, it is indeed 10,000 times worse. Of course, the code for the problem that netizen encountered is more complicated than the above. The above code is specially simplified by me to illustrate the problem. Maybe you have already seen the problem from the code, but as for why this is happening. I think it is still necessary to analyze it. In this way, you will not make the same mistake when using PHP in the future.
In order to reduce copying, PHP adopts a copy on writer mechanism. I think this is a very common mechanism and you must have heard of it. For example, the implementation of gcc's stl string uses such a mechanism. String assignment is not a real copy, and it is copied only when it is modified. Let’s take the simplest example first:
$a = str_repeat("", ); $b = $a; $a[] = "";
$a is a very large string. If $b = $a, copying will consume a lot of memory and CPU, which is very uneconomical. In case, the following code does not modify $a With $b, copying is not necessary at all. Of course, $a is modified later. At this time, it must be copied, otherwise it will not be logical. However, now the question arises, how do you know that when $a is modified, it must be copied? There must be such a mark. The method is to use reference counting. Reference counting is also used for memory management.
The basic process is as follows:
1: Create a variable that can hold such a string of 10,000 zeros.
2: Create a variable symbol a, which refers to this variable. Note that variable symbols and variables are not the same thing, they are separate.
From the perspective of C language, PHP probably accomplishes the following thing:
char *varname = "a"; size_t varname_len = strlen(varname); zend_hash_add(EG(active_symbol_table), varname, varname_len + , &var, sizeof(zval*), NULL);
active_symbol_table is a symbol table of PHP. All accessible variables are in this. It is a hash table. The variable var stores a string of 10,000 zeros. And it is the structure of zval. The structure of zval is as follows:
typedef struct _zval_struct { zvalue_value value; zend_uint refcount; zend_uchar type; zend_uchar is_ref; } zval; typedef union _zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value;
zvalue_value is a union that can save long, double, string, hash table (PHP Array), and object. That is, all PHP types. In fact, zval is just for zvalue_value, adding three functions: type, reference is_ref, and reference count refcount. This is just a normal variable in PHP. If you use PHP to do larger things, you will find that the memory usage is very large. It's because this variable is no longer a variable in the traditional C language. It adds a lot of things.
Okay, the first sentence is completed, here is the second sentence. The second sentence is very simple. It will generate a new variable symbol b and add it to active_symbol_table, but it will not add a new variable, but only refcount. The assignment is complete. As shown in the picture:
First of all, we have to note that a and b are just symbols. They are a key in the active_symbol_table table, and both have a pointer pointing to a zval. Therefore, a and b are completely consistent at the C language level. We get the first law of PHP variables:
The first law of PHP variables: If two variables point to the same zval, then the two variables are indistinguishable. That is to say, any operation on a is symmetrical with respect to b. The symmetry here is understood like this. It’s you in the mirror, not the same. For example, if a is assigned a value, a will produce a copy. Similarly, if you assign a value to b, the same operation will be performed, that is, b will generate a copy. That is, a and b behave the same.
The third sentence, when writer occurs, PHP will determine whether the refcount is greater than 2. If it is greater than 2, then copy the zval, and then replace the original zval with refcount--. This is all about copy on writer. You must feel that you are very familiar with all this and you understand it.
但是,PHP不仅仅是copy on writer 这样简单,它还有一个引用的问题。引入引用的概念,这样,问题就变的有些复杂了。因为,引用这个标记,意思就是说,writer 的时候,你也不需要复制。这样,会修改原来的那个变量。从我们在学校里面以前经常学习的哲学上来说,这是一对矛盾。他们是对立的,又是统一的,各有各的用处。所谓,存在的就是合理的。
好,下面我们来看看这对矛盾,我们只考虑两种组合的情况。多种组合都是类似的。两种组合的话,就是赋值在前,引用在后。
或者 引用在前,赋值在后。我们会分别讨论,先来看:就是赋值在前,引用在后的情况。
$a = ; $b = $a; $c = &$a;
$b = $a, 是copy on writer 行为的 赋值。而 $c 和 $a 是引用赋值。我们假设在上面这样的情况下,我们可以用一个zval表示,也就是不需要复制,那么情况是这样的:
根据我们的PHP变量第一定律,那,就是说,a,b,c的操作是对称的,但是非常明显,对 b 操作要产生复制行为,而对a操作不会产生复制,操作行为不相同,和第一定律矛盾。也就是说,要使得上面的操作没有矛盾,必须,进行分离。分离的原则就是,谁制造矛盾,谁复制。显然是 第三句话,$c = &$a; 在制造矛盾。所以,内部变量的复制过程如下图:
上面情况是赋值在前,引用在后的情况。还有一种情况是,引用在前赋值在后:
$a = ; $b = &$a; $c = $a;
按照PHP变量的第一定律,a,b,c 必须进行分离,才能保证定律的正确。可以发现,b 和 a 明显是一伙人,就是说,b 和 a 的操作是对称的,他们可以指向同一个zval ,而c 的行为和 a,b 不一样,改变c 需要进行复制。看到这里,我想,如果你看懂了的话,为什么刚开始,贴出来的那段代码的,那个两个count差异如此之大,你也应该明白了。当我和那个网友讨论的时候,它最后说,那这样的话,PHP设计的不好,我完全可以,$c先不进行复制,等c被write 了,再进行复制。看来要说懂一个东西,还是一件很难的事情,好好想想那个PHP第一定律吧。你可以假设不进行分离,c指向同一个zval,所以,c 和 a,b的行为是一样的,是is_ref = 1,所以,c 不会进行复制。最后一种内部执行情况可以用下图表示:
我以前也进行搞混这个引用,现在,你可以用那个第一定律来分析所有的情况了。PHP内核分析的文章,以后我还会写一些,如果你想深入了解PHP的某些方面,可以给我留言。
最后再补充一点,也是一个隐性的错误。
function count_bigarray() { global $bigarray; return count($bigarray); }
这里,没有显示的引用,但是这里隐藏了一个引用。PHP会自动创建一个引用全局变量 $bigarray 的代码,如果你在这里使用count,那么这个效率会非常的慢。最好直接通过$GLOBAL 数组进行引用。
下面文章将给大家介绍深入理解php内核二之SAPI探究,希望大家继续关注哦。