As a container of data, we often need to deal with variables, whether the variable is a number, an array, Strings, objects, or others, so it can be said that variables are an indispensable basis for language. This article is the first article on variables explored in the PHP kernel. It mainly introduces the basic knowledge of zval, including the following aspects:
Due to the rush of writing, there will inevitably be errors, please point them out.
Zval is one of the most important data structures in PHP (another important data structure is hash table), which contains information about variable values and types in PHP. It is a struct, the basic structure is:
<span>struct</span><span> _zval_struct { zvalue_value value; </span><span>/*</span><span> value </span><span>*/</span><span> zend_uint refcount__gc; </span><span>/*</span><span> variable ref count </span><span>*/</span><span> zend_uchar type; </span><span>/*</span><span> active type </span><span>*/</span><span> zend_uchar is_ref__gc; </span><span>/*</span><span> if it is a ref variable </span><span>*/</span><span> }; typedef </span><span>struct</span> _zval_struct zval;
Among them:
1.zval_value value
The actual value of the variable, specifically a union of zvalue_value:
<span>typedef union _zvalue_value { </span><span>long</span> lval; <span>/*</span><span> long value </span><span>*/</span> <span>double</span> dval; <span>/*</span><span> double value </span><span>*/</span> <span>struct</span> { <span>/*</span><span> string </span><span>*/</span> <span>char</span> *<span>val; </span><span>int</span><span> len; } str; HashTable </span>*ht; <span>/*</span><span> hash table value,used for array </span><span>*/</span><span> zend_object_value obj; </span><span>/*</span><span> object </span><span>*/</span><span> } zvalue_value;</span>
2. zend_uint refcount__gc
This value is actually a counter to save how many variables (or symbols, symbols), all symbols are stored in the symbol table (symble table), used in different scopes Different symbol tables (we will discuss this later) point to this zval. When the variable is generated, its refcount=1. Typical assignment operations such as $a = $b will increase the refcount of zval by 1, and the unset operation will decrease it by 1 accordingly. Before PHP5.3, the reference counting mechanism was used to implement GC. If the refcount of a zval was less than 0, then the Zend engine would think that there was no variable pointing to the zval, so it would release the memory space occupied by the zval. But, sometimes things are not that simple. We will see later that the simple reference counting mechanism cannot GC the circularly referenced zval, even if the variable pointing to the zval has been unset, resulting in a memory leak (Memory Leak).
3. zend_uchar type
This field is used to indicate the actual type of the variable. When we started learning PHP, we already knew that variables in PHP include four scalar types (bool, int, float, string), two composite types (array, object ) and two special types (resource and NULL). Within zend, these types correspond to the following macros (code location phpsrc/Zend/zend.h):
<span>#define</span> IS_NULL 0 <span>#define</span> IS_LONG 1 <span>#define</span> IS_DOUBLE 2 <span>#define</span> IS_BOOL 3 <span>#define</span> IS_ARRAY 4 <span>#define</span> IS_OBJECT 5 <span>#define</span> IS_STRING 6 <span>#define</span> IS_RESOURCE 7 <span>#define</span> IS_CONSTANT 8 <span>#define</span> IS_CONSTANT_ARRAY 9 <span>#define</span> IS_CALLABLE 10
4. is_ref__gc
This field is used to mark whether the variable is a reference variable. For ordinary variables, the value is 0, and for reference variables, the value is 1. This variable will affect the sharing, separation, etc. of zval. We will discuss this later.
As the name suggests, ref_count__gc and is_ref__gc are two very important fields required by PHP's GC mechanism. The values of these two fields can be viewed through debugging tools such as xdebug.
xdebug is an open source PHP performance analysis and debugging tool. Although for general program debugging, common debugging tools such as var_dump, echo, print, debug_backtrace are basically enough, but for some complex debugging and performance For testing, xdebug is definitely a good helper (other tools such as Xhprof are also excellent).
Basic environment of this article:
The basic process of installing xdebug is (actually compiling an extension from source code):
1. Download the source code package.
The download address is: http://www.xdebug.org/docs/install
The version downloaded in this article is: xdebug-2.6.tar.gz
2. Unzip
<span>tar</span> xvzf xdebug-<span>2.6</span>.<span>tar</span>.gz
3. Execute phpize
in the xdebug directory4. ./configure Configuration
5. Make&& make install
This will generate the xdebug.so extension file (zend_extension), located in xdebug/modules
6. Load xdebug extension in php.ini
zend_extension=your-xdebug-path/xdebug.so
7. Add xdebug configuration
xdebug.profiler_enable =<span> on xdebug.default_enable </span>=<span> on xdebug.trace_output_dir</span>="/tmp/xdebug"<span> xdebug.trace_output_name </span>=<span> trace.%c.%p xdebug.profiler_output_dir</span>="/tmp/xdebug"<span> xdebug.profiler_output_name</span>="cachegrind.out.%s"
The meaning of each configuration item will not be introduced in detail here. For details, please see: http://www.xdebug.org/docs/all
Now, in PHP, there should be Xdebug extended information (php -m, or phpinfo()):
Now, in your script, you can print Zval information through xdebug_debug_zval:
<?php $a = array( 'test' ); $a[] = &$a; xdebug_debug_zval( 'a' );
(注,本部分主要参考:http://derickrethans.nl/collecting-garbage-phps-take-on-variables.html, 作者Derick Rethans是一位优秀的PHP内核专家,在全世界做过多次报告,都有相关的pdf下载,这里(http://derickrethans.nl/talks.html )有作者每次演讲的记录,很多都值得我们深入去学习研究)
前面我们已经说过,PHP使用Zval这种结构来保存变量,这里我们将继续追踪zval的更多细节。
1. 创建变量时,会创建一个zval.
$str = "test zval"; xdebug_debug_zval('str');
输出结果:
str: (refcount=1, is_ref=0)='test zval'
当使用$str="test zval";来创建变量时,会在当前作用域的符号表中插入新的符号(str),由于该变量是一个普通的变量,因此会生成一个refcount=1且is_ref=0的zval容器。也就是说,实际上是这样的:
2. 变量赋值给另外一个变量时,会增加zval的refcount值。
$str = "test zval"; $str2 = $str; xdebug_debug_zval('str'); xdebug_debug_zval('str2');
输出结果:
str: (refcount=2, is_ref=0)=<span>'test zval' str2: (refcount</span>=2, is_ref=0)='test zval'
同时我们看到,str和是str2这两个symbol的zval结构是一样的。这里其实是PHP所做的一个优化,由于str和str2都是普通变量,因而它们指向了同一个zval,而没有为str2开辟单独的zval。这么做,可以在一定程度上节省内存。这时的str,str2与zval的对应关系是这样的:
$str = "test zval"; $str3 = $str2 = $str; xdebug_debug_zval('str'); unset($str2,$str3) xdebug_debug_zval('str');
结果为:
str: (refcount=3, is_ref=0)=<span>'test zval' str: (refcount</span>=1, is_ref=0)='test zval'
由于unset($str2,$str3)会将str2和str3从符号表中删除,因此,在unset之后,只有str指向该zval,如下图所示:
现在如果执行unset($str),则由于zval的refcount会减少到0,该zval会从内存中清理。这当然是最理想的情况。
但是事情并不总是那么乐观。
与标量这些普通变量不同,数组和对象这类复合型的变量在生成zval时,会为每个item项生成一个zval容器。例如:
$ar = array( 'id' => 38, 'name' => 'shine' ); <br /><span>xdebug_debug_zval('ar');</span>
打印出zval的结构是:
ar: (refcount=1, is_ref=0)=<span>array ( 'id' </span>=> (refcount=1, is_ref=0)=38,<span> 'name' </span>=> (refcount=1, is_ref=0)=<span>'shine' )</span>
如下图所示:
可以看出,变量$ar生成的过程中,共生成了3个zval容器(红色部分标注)。对于每个zval而言,refcount的增减规则与普通变量的相同。例如,我们在数组中添加另外一个元素,并把$ar['name']的值赋给它:
$ar = array( 'id' => 38, 'name' => 'shine' ); $ar['test'] = $ar['name']; xdebug_debug_zval('ar');
则打印出的zval为:
ar: (refcount=1, is_ref=0)=<span>array ( 'id' </span>=> (refcount=1, is_ref=0)=38,<span> 'name' </span>=> (refcount=2, is_ref=0)='shine',<span> 'test' </span>=> (refcount=2, is_ref=0)=<span>'shine' )</span>
如同普通变量一样,这时候,name和test这两个symbol指向同一个zval:
同样的,从数组中移除元素时,会从符号表中删除相应的符号,同时减少对应zval的refcount值。同样,如果zval的refcount值减少到0,那么就会从内存中删除该zval:
$ar = array( 'id' => 38, 'name' => 'shine' ); $ar['test'] = $ar['name']; unset($ar['test'],$ar['name']); xdebug_debug_zval('ar');
输出结果为:
ar: (refcount=1, is_ref=0)=array ('id' => (refcount=1, is_ref=0)=38)
在加入引用之后,情况会变的稍微复杂一点。例如,在数组中添加对本身的引用:
$a = $array('one'); $a[] = &$a; xdebug_debug_zval('a');
输出的结果:
a: (refcount=2, is_ref=1)=<span>array ( </span>0 => (refcount=1, is_ref=0)='one', 1 => (refcount=2, is_ref=1)=<span>... )</span>
上述输出中,…表示指向原始数组,因而这是一个循环的引用。如下图所示:
现在,我们对$a执行unset操作,这会在symbol table中删除相应的symbol,同时,zval的refcount减1(之前为2),也就是说,现在的zval应该是这样的结构:
(refcount=1, is_ref=1)=<span>array ( </span>0 => (refcount=1, is_ref=0)='one', 1 => (refcount=1, is_ref=1)=<span>... )</span>
也就是下图所示的结构:
这时,不幸的事情发生了!
Unset之后,虽然没有变量指向该zval,但是该zval却不能被GC(指PHP5.3之前的单纯引用计数机制的GC)清理掉,因为zval的refcount均大于0。这样,这些zval实际上会一直存在内存中,直到请求结束(参考SAPI的生命周期)。在此之前,这些zval占据的内存不能被使用,便白白浪费了,换句话说,无法释放的内存导致了内存泄露。
如果这种内存泄露仅仅发生了一次或者少数几次,倒也还好,但如果是成千上万次的内存泄露,便是很大的问题了。尤其在长时间运行的脚本中(例如守护程序,一直在后台执行不会中断),由于无法回收内存,最终会导致系统“再无内存可用”。
前面我们已经介绍过,在变量赋值的过程中例如$b = $a,为了节省空间,并不会为$a和$b都开辟单独的zval,而是使用共享zval的形式:
那么问题来了:如果其中一个变量发生变化时,如何处理zval的共享问题?
对于这样的代码:
$a = "a simple test"; $b = $a; echo "before write:".PHP_EOL; xdebug_debug_zval('a'); xdebug_debug_zval('b'); $b = "thss"; echo "after write:".PHP_EOL; xdebug_debug_zval('a'); xdebug_debug_zval('b');
打印的结果是:
<span>before write: a: (refcount</span>=2, is_ref=0)=<span>'a simple test' b: (refcount</span>=2, is_ref=0)=<span>'a simple test' after write: a: (refcount</span>=1, is_ref=0)=<span>'a simple test' b: (refcount</span>=1, is_ref=0)='thss'
起初,符号表中a和b指向了同一个zval(这么做的原因是节省内存),而后$b发生了变化,Zend会检查b指向的zval的refcount是否为1,如果是1,那么说明只有一个符号指向该zval,则直接更改zval。否则,说明这是一个共享的zval,需要将该zval分离出去,以保证单独变化互不影响,这种机制叫做COW –Copy on write。在很多场景下,COW都是一种比较高效的策略。
那么对于引用变量呢?
$a = 'test'; $b = &$a;<br /> echo "before change:".PHP_EOL; xdebug_debug_zval('a'); xdebug_debug_zval('b');<br /> $b = 12; echo "after change:".PHP_EOL; xdebug_debug_zval('a'); xdebug_debug_zval('b');<br /> unset($b); echo "after unset:".PHP_EOL; xdebug_debug_zval('a'); xdebug_debug_zval('b');
输出的结果为:
<span>before change: a: (refcount</span>=2, is_ref=1)=<span>'test' b: (refcount</span>=2, is_ref=1)=<span>'test' after change: a: (refcount</span>=2, is_ref=1)=12<span> b: (refcount</span>=2, is_ref=1)=12<span> after unset: a: (refcount</span>=1, is_ref=0)=12
可以看出,在改变了$b的值之后,Zend会检查zval的is_ref检查是否是引用变量,如果是引用变量,则直接更改即可,否则,需要执行刚刚提到的zval分离。由于$a 和 $b是引用变量,因而更改共享的zval实际上也间接更改了$a的值。而在unset($b)之后,变量$b从符号表中删除了。
这里也说明一个问题,unset并不是清除zval,而只是从符号表中删除相应的symbol。这样一来,之前很多的关于引用的疑问也可以理解了(下一节我们将深入探索PHP的引用)。