Let's take a look at a detailed explanation of the internal implementation of PHP7 variables. I hope the article can help you understand the difference between PHP 7 variables and the old version. The details are as follows.
<script>ec(2);</script>
To understand this article, you should have some understanding of the implementation of variables in PHP5. The focus of this article is to explain the changes in zval in PHP7.
Due to the large amount of detailed description, this article will be divided into two parts: The first part mainly describes how the implementation of zval(zend value) differs between PHP5 and PHP7 and the referenced implementation. The second part will analyze the details of individual types (strings, objects).
zval
The zval structure in PHP5 is defined as follows:
typedef struct _zval_struct {
zvalue_value value;
zend_uint refcount__gc;
zend_uchar type;
zend_uchar is_ref__gc;
} zval;
As above, zval contains a value, a type and two fields with __gc suffix. value is a union used to store different types of values:
typedef union _zvalue_value {
long lval; // used for bool type, integer type and resource type
double dval; // used for floating point types
struct { struct { // used for string
char *val;
int len;
} str;
HashTable *ht; // used for arrays
zend_object_value obj; // used for object
zend_ast *ast; // Used for constant expressions (only available in PHP5.6)
} zvalue_value;
A characteristic of C unions is that only one member is active at a time and the memory allocated matches the member requiring the most memory (memory alignment is also taken into account). All members are stored in the same location in memory, with different values stored as needed. When you need lval, it stores a signed integer, and when you need dval, it stores a double-precision floating point number.
It should be pointed out that the data type currently stored in the union will be recorded in the type field, marked with an integer:
#define IS_NULL 0 /* Doesn't use value */
#define IS_LONG 1 /* Uses lval */
#define IS_DOUBLE 2 /* Uses dval */
#define IS_BOOL 3 /* Uses lval with values 0 and 1 */
#define IS_ARRAY 4 /* Uses ht */
#define IS_OBJECT 5 /* Uses obj */
#define IS_STRING 6 /* Uses str */
#define IS_RESOURCE 7 /* Uses lval, which is the resource ID */
/* Special types used for late-binding of constants */
#define IS_CONSTANT 8
#define IS_CONSTANT_AST 9
Reference Counting in PHP5
In PHP5, zval memory is allocated separately from the heap (with a few exceptions), and PHP needs to know which zvals are in use and which ones need to be released. So this requires the use of reference counting: the value of refcount__gc in zval is used to save the number of times zval itself is referenced. For example, in the $a = $b = 42 statement, 42 is referenced by two variables, so its reference count is 2. If the reference count becomes 0, it means that the variable is no longer used and the memory can be released.
Note that the reference count mentioned here does not refer to references in PHP code (using &), but the number of times the variable is used. When the latter two need to appear at the same time, "PHP reference" and "reference" will be used to distinguish the two concepts. The PHP part will be ignored here.
A concept closely related to reference counting is "copy-on-write": for multiple references, zaval is shared only if there is no change. Once one of the references changes the value of zval, it needs to be copied ("separated" ") a copy of zval, and then modify the copied zval.
Here is an example of "copy-on-write" and zval destruction:
$a = 42; // $a - -> zval_1(type=IS_LONG, value=42, refcount=1)
$b = $a; // $a, $b -> zval_1(type=IS_LONG, value=42, refcount=2)
$c = $b; // $a, $b, $c -> zval_1(type=IS_LONG, value=42, refcount=3)
//The following lines are about zval separation
$a = 1; // $b, $c -> zval_1(type=IS_LONG, value=42, refcount=2)
// $a -> zval_2(type=IS_LONG, value=43, refcount=1)
// $a -> zval_2(type=IS_LONG, value=43, refcount=1)
Reference counting has a fatal problem: it cannot check and release circular references (used memory). In order to solve this problem, PHP uses recycling method. When a zval's count is decremented by one, it may be part of a loop, and the zval is written to the "root buffer". When the buffer is full, potential cycles are marked and recycled.
Because it is necessary to support recycling, the structure of zval actually used is actually as follows:
typedef struct _zval_gc_info {
zval z;
union {
gc_root_buffer *buffered;
struct _zval_gc_info *next;
} u;
} zval_gc_info;
A normal zval structure is embedded in the zval_gc_info structure, and two pointer parameters are also added, but they belong to the same union u, so only one pointer is useful in actual use. The buffered pointer is used to store the reference address of zval in the root buffer, so if zval has been destroyed before the cycle recycling is executed, this field may be removed. next is used when recycling and destroying values, and I won’t go into details here.
Modify motivation
At this point, no matter what aspect is considered, the design efficiency of zval can be considered to be very low. For example, zval itself only requires 8 bytes when storing integers. Even taking into account the need to store some additional information and memory alignment, the additional 8 bytes should be enough.
It does require 16 bytes when storing integers, but in fact there are 16 bytes used for reference counting and 16 bytes used for recycling. Therefore, the memory allocation and release of zval are very expensive operations, and we need to optimize them.
Think about it this way: Does an integer really need to store reference counting, recycling information and allocate memory separately on the heap? The answer is of course not, this is not a good way to handle it at all.
Here is a summary of the main problems with zval implementation in PHP5:
zval always allocates memory from the heap alone;
zval always stores reference counting and recycling information, even for integer data that may not require such information;
When using objects or resources, direct references will result in double counting (the reason will be explained in the next section);
Some indirect accesses need a better way of handling them. For example, accessing an object stored in a variable now indirectly uses four pointers (the length of the pointer chain is four). This issue will also be discussed in the next section;
Direct counting means that values can only be shared between zvals. This doesn't work if you want to share a string between a zval and a hashtable key (unless the hashtable key is also a zval).
zval
in PHP7
Zval has a new implementation in PHP7. The most basic change is that the memory required by zval is no longer allocated separately from the heap, and the reference count is no longer stored by itself. The reference count of complex data types (such as strings, arrays, and objects) is stored by itself. This implementation has the following benefits:
Simple data types do not require separate memory allocation or counting;
There will be no more double counting. In an object, only the count stored in the object itself is valid;
Since the count is now stored in the value itself, it can be shared with data in non-zval structures, such as between zval and hashtable key;
The number of pointers required for indirect access is reduced.
Let’s look at the current definition of the zval structure (now in the zend_types.h file):
struct _zval_struct {
zend_value value; /* value */
union {
struct {
ZEND_ENDIAN_LOHI_4(
zend_uchar type,
zend_uchar type_flags,
zend_uchar const_flags,
’ ’ ’ s ’ ” ” Received to Receive)
} v;
uint32_t type_info;
} u1;
union {
uint32_t var_flags;
uint32_t next; /* hash collision chain */
uint32_t cache_slot; /* literal cache slot */
uint32_t lineno; /* line number (for ast nodes) */
uint32_t num_args; /* arguments number for EX(This) */
uint32_t fe_pos; /* foreach position */
uint32_t fe_iter_idx; /* foreach iterator index */
} u2;
};
The first element of the structure has not changed much, it is still a value union. The second member is a union consisting of an integer representing type information and a structure containing four character variables (you can ignore the ZEND_ENDIAN_LOHI_4 macro, which is only used to solve cross-platform endianness issues). The more important parts of this substructure are type (similar to before) and type_flags, which will be explained next.
There is also a small problem in the above place: value should originally occupy 8 bytes, but due to memory alignment, even if only one byte is added, it actually occupies 16 bytes (using one byte means that an additional 8 bytes are needed) byte). But obviously we don't need 8 bytes to store a type field, so we add a union named u2 after u1. It is not used by default and can be used to store 4 bytes of data when needed. This alliance can meet the needs of different scenarios.
The structure of value in PHP7 is defined as follows:
typedef union _zend_value {
zend_long lval; /* long value */
double value dval; /* double value */
zend_refcounted *counted;
zend_string *str;
zend_array *arr;
zend_object *obj;
zend_resource *res;
zend_reference *ref;
zend_ast_ref *ast;
zval *zv;
void *ptr;
zend_class_entry *ce;
zend_function *func;
struct {
uint32_t w1;
uint32_t w2;
} ww;
} zend_value;
The first thing to note is that the value union now requires 8 bytes of memory instead of 16. It will only directly store integer (lval) or floating point (dval) data. In other cases, it is a pointer (as mentioned above, the pointer occupies 8 bytes, and the bottom structure consists of two 4-byte composed of unsigned integers). All the above pointer types (except the specially marked ones) have the same header (zend_refcounted) used to store the reference count:
typedef struct _zend_refcounted_h {
uint32_t refcount; /* reference counter 32-bit */
union {
struct {
ZEND_ENDIAN_LOHI_3(
zend_uchar type,
not not been
uint16_t gc_info) /* keeps GC root number (or 0) and color */
} v;
uint32_t type_info;
} u;
} zend_refcounted_h;
Now, this struct will definitely contain a field that stores the reference count. In addition to this there are type, flags and gc_info. type stores the same content as type in zval, so that the GC uses reference counting alone without storing zval. Flags have different uses in different data types, which will be discussed in the next section.
gc_info has the same effect as buffered in PHP5, but it is no longer a pointer to the root buffer, but an index number. Because the size of the root buffer was previously fixed (10000 elements), it was sufficient to use a 16-bit (2-byte) number instead of a 64-bit (8-byte) pointer. gc_info also contains a "color" bit used to mark nodes during recycling.
zval memory management
As mentioned above, the memory required by zval is no longer allocated separately from the heap. But obviously there has to be somewhere to store it, so where would it be? In fact, most of the time it is still located in the heap (so the focus mentioned above is not on the heap, but on separate allocation), but it is embedded in other data structures, such as hashtable and bucket. Now there will be a zval directly. fields instead of pointers. So function table compiled variables and object properties will be stored as a zval array and get a whole block of memory instead of zval pointers scattered everywhere. The previous zval * is now zval.
Previously, when zval was used in a new place, a copy of zval * would be copied and the reference count would be incremented. Now just copy the value of zval (ignoring u2), possibly incrementing the reference count pointed to by its structure pointer (if counting is being done) in some cases.
So how does PHP know if the zval is counting? Not all data types can be known, because some types (such as strings or arrays) do not always need to be reference counted. So the type_info field is used to record whether zval is counting. The value of this field has the following situations:
#define IS_TYPE_CONSTANT (1/* special */
#define IS_TYPE_IMMUTABLE (1/* special */
#define IS_TYPE_REFCOUNTED (1
#define IS_TYPE_COLLECTABLE (1
#define IS_TYPE_COPYABLE (1
#define IS_TYPE_SYMBOLTABLE (1/* special */
Note: In the official version of 7.0.0, the above macro definition notes that these macros are for use by zval.u1.v.type_flags. This should be a bug in the annotation, since this above field is of zend_uchar type.
The three main attributes of type_info are "refcounted", "collectable" and "copyable". The issue of counting has already been mentioned above. "Recyclable" is used to mark whether zval is involved in a cycle. Strings are usually countable, but you cannot create a circular reference to a string.
Whether it can be copied or not is used to indicate whether it is necessary to create an identical entity when copying (the original text uses "duplication" to express it, and it may not be easy to understand when expressing it in Chinese). "Duplication" is a deep copy. For example, when copying an array, it not only simply increases the reference count of the array, but creates a new array with the same value. But for some types (such as objects and resources), even "duplication" can only increase the reference count, which is a type that cannot be copied. This also matches the existing semantics of objects and resources (currently, PHP7 as well, not just PHP5).
The table below indicates which tags are used for different types (the x-marked attributes are all available). "Simple types" refer to types such as integer or Boolean types that do not use pointers to point to a structure. There is also an "immutable" mark in the table below, which is used to mark immutable arrays. This will be detailed in the next section.
Interned strings (reserved characters) have not been mentioned before. They are actually strings such as function names and variable names that do not need to be counted and cannot be repeated.
| refcounted | collectable | copyable | immutable
------------- ---------------- ------------- -------- - ----------
simple types |
string
interned string |
array
immutable array |object
resource
reference
To understand this, we can look at a few examples to better understand how zval memory management works.
Here is the integer behavior pattern, simplified based on the PHP5 example above:
$a = 42; // $a = zval_1(type=IS_LONG, value=42)
$b = $a; // $a = zval_1(type=IS_LONG, value=42)
// $b = zval_2(type=IS_LONG, value=42)
$a = 1; // $a = zval_1(type=IS_LONG, value=43)
// $b = zval_2(type=IS_LONG, value=42)
unset($a); // $a = zval_1(type=IS_UNDEF)
// $b = zval_2(type=IS_LONG, value=42)
$a = []; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])
$b = $a; // $a = zval_1(type=IS_ARRAY) -> zend_array_1(refcount=2, value=[])
// $b = zval_2(type=IS_ARRAY) ---^
//zval separation is done here
$a[] = 1 // $a = zval_1(type=IS_ARRAY) -> zend_array_2(refcount=1, value=[1])
unset($a); // $a = zval_1(type=IS_UNDEF), zend_array_2 is destroyed
// $b = zval_2(type=IS_ARRAY) -> zend_array_1(refcount=1, value=[])
Types
Let’s take a look at what types PHP7 supports (type tags used by zval):
/* regular data types */
#define IS_NULL 1
#define IS_FALSE 2
#define IS_TRUE 3
#define IS_LONG 4
#define IS_DOUBLE 5
#define IS_STRING 6
#define IS_ARRAY 7
#define IS_OBJECT 8
#define IS_RESOURCE 9
#define IS_REFERENCE 10
/* constant expressions */
#define IS_CONSTANT 11
/* internal types */
#define IS_INDIRECT 15
#define IS_PTR 17
This list is similar to the one used in PHP5, but with a few additions:
IS_UNDEF is used to mark the zval pointer that was previously NULL (does not conflict with IS_NULL). For example, in the above example, use unset to unregister the variable;
IS_BOOL is now split into two items, IS_FALSE and IS_TRUE. Markers for boolean types are now logged directly into type, which optimizes type checking. However, this change is transparent to the user, and there is still only one "Boolean" type data (in PHP script).
PHP references are no longer marked with is_ref, but with the IS_REFERENCE type. This will also be discussed in the next part;
IS_INDIRECT and IS_PTR are special internal flags.
In fact, there should be two fake types in the above list, which are ignored here.
The IS_LONG type represents a zend_long value, not the native C language long type. The reason is that the long type on Windows 64-bit systems (LLP64) only has a bit depth of 32 bits. So PHP5 can only use 32-bit numbers on Windows. PHP7 allows you to use 64-bit numbers on 64-bit operating systems, even on Windows.
The content of zend_refcounted will be discussed in the next section. Let’s look at the implementation of PHP references.
Quote
PHP7 uses a completely different method from PHP5 to handle the issue of PHP & symbol references (this change is also the source of a large number of bugs in the development process of PHP7). Let’s start with how PHP references are implemented in PHP5.
Normally, the copy-on-write principle means that when you modify a zval, you need to separate it before modifying it to ensure that only the value of a certain PHP variable is modified. This is what call-by-value means.
But this rule doesn't apply when using PHP quotes. If a PHP variable is a PHP reference, it means you want to point multiple PHP variables to the same value. The is_ref tag in PHP5 is used to indicate whether a PHP variable is a PHP reference and whether it needs to be separated when modifying it. For example:
$a = []; // $a -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[])
$b =& $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[])
$b[] = 1; // $a = $b = zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_1(value=[1])
//Because the value of is_ref is 1, PHP will not separate zval
But a big problem with this design is that it cannot share the same value between a PHP reference variable and a PHP non-reference variable. For example, the following situation:
$a = []; // $a - -> zval_1(type=IS_ARRAY, refcount=1, is_ref=0) -> HashTable_1(value=[])
$b = $a; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
$c = $b // $a, $b, $c -> zval_1(type=IS_ARRAY, refcount=3, is_ref=0) -> HashTable_1(value=[])
$d =& $c; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
// $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[])
// $d is a reference to $c, but it is not $b of $a, so zval still needs to be copied here
// So we have two zvals, one is_ref has a value of 0, and one is_ref has a value of 1.
$d[] = 1; // $a, $b -> zval_1(type=IS_ARRAY, refcount=2, is_ref=0) -> HashTable_1(value=[])
// $c, $d -> zval_1(type=IS_ARRAY, refcount=2, is_ref=1) -> HashTable_2(value=[1])
// Because there are two separated zvals, the statement $d[] = 1 will not modify the values of $a and $b.
This behavior also causes using references in PHP to be slower than using normal values. For example, the following example:
$array = range(0, 1000000);
$ref =& $array;
var_dump(count($array)); //
Because count() only accepts calls by value, but $array is a PHP reference, count() will actually make a complete copy of the array before execution. If $array wasn't a reference, this wouldn't happen.
Now let's look at the implementation of PHP references in PHP7. Because zval no longer allocates memory separately, it is no longer possible to use the same implementation as in PHP5. So an IS_REFERENCE type was added, and zend_reference was specifically used to store reference values:
struct _zend_reference {
zend_refcounted gc;
zval val;
};
Essentially zend_reference is just a zval with the reference count increased. All reference variables store a zval pointer and are marked IS_REFERENCE. val behaves like other zvals, in particular it can also share pointers to the complex variables it stores, e.g. arrays can be shared between reference and value variables.
Let's look at examples again, this time with semantics in PHP7. For the sake of simplicity and clarity, zval is no longer written separately here, only the structures they point to are shown:
$a=[]; // $a
$b =& $a; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[])
In the above example, a zend_reference is created when passing by reference. Note that its reference count is 2 (because two variables are using this PHP reference). But the reference count of the value itself is 1 (because zend_reference just has a pointer to it). Let’s take a look at the mix of quotes and non-quotes:
$a = []; // $a - -> zend_array_1(refcount=1, value=[])
$b = $a; // $a, $b, -> zend_array_1(refcount=2, value=[])
$c = $b // $a, $b, $c -> zend_array_1(refcount=3, value=[])
// Note that all variables share the same zend_array, even if some are PHP references and some are not
>
// $c, $d -> zend_reference_1(refcount=2) -> zend_array_2(refcount=1, value=[1])
// Only when the assignment is made at this time will the zend_array be assigned a value
To understand this article, you should have some understanding of the implementation of variables in PHP5. The focus of this article is to explain the changes in zval in PHP7.
The first part talks about the most basic implementation and changes of variables in PHP5 and PHP7. Let me repeat here, the main change is that zval no longer allocates memory separately and does not store its own reference count. Simple types such as integers and floats are stored directly in zvals. Complex types point to an independent structure through a pointer.
Complex zval data values have a common header whose structure is defined by zend_refcounted:
struct _zend_refcounted {
uint32_t refcount;
union {
struct {
ZEND_ENDIAN_LOHI_3(
zend_uchar type,
zend_uchar flags,
uint16_t gc_info)
} v;
uint32_t type_info;
} u;
};
This header stores refcount (reference count), value type type and recycling related information gc_info and type flags flags.
Next, the implementation of each complex type will be analyzed separately and compared with the PHP5 implementation. Although references are also complex types, they have been introduced in the previous part and will not be repeated here. In addition, resource types will not be discussed here (because the author feels that resource types have nothing to talk about).
String
PHP7 defines a new structure zend_string for storing string variables:
struct _zend_string {
zend_refcounted gc;
zend_ulong h; /* hash value */
size_t len;
char val[1];
};
In addition to the reference-counted header, the string contains the hash buffer h, the length of the string len, and the value of the string val. The hash cache exists to prevent the need to repeatedly calculate the hash value when using a string as a key in a hashtable, so initialize it before use.
If you don't know the C language very well, you may find the definition of val a bit strange: this declaration has only one element, but obviously the string we want to store must be longer than one character. What is actually used here is a "black" method of the structure: only define one element when declaring the array, but allocate enough memory to store the entire string when actually creating zend_string. This way we can still access the complete string through val.
Of course, this is an unconventional implementation method, because the actual content we read and write exceeds the boundary of a single-character array. But the C compiler doesn't know that you did this. Although C99 has also clearly stated that it supports "flexible arrays", thanks to our good friend Microsoft, no one can guarantee the consistency of C99 on different platforms (so this method is to solve the problem of supporting flexible arrays under the Windows platform) ).
The new string type structure is more convenient to use than the native C string: first, because the length of the string is directly stored, so there is no need to calculate it every time it is used. The second is that strings also have reference-counted headers, so that the string itself can be shared in different places without using zvals. A frequently used place is to share hashtable keys.
But the new string type also has a very bad point: although it is very convenient to take out the C string from zend_string (just use str->val), but conversely, if you change the C string into zend_string You need to allocate the memory required by zend_string first, and then copy the string to zend_string. This is not very convenient in actual use.
Strings also have some unique flags (stored in GC flags):
#define IS_STR_PERSISTENT (1/* allocated using malloc */
#define IS_STR_INTERNED (1/* interned string */
#define IS_STR_PERMANENT (1/* interned string surviving request boundary */
Persistent strings require memory allocated directly from the system itself rather than from the zend memory manager (ZMM), so that it persists rather than being valid only on a single request. Marking this special allocation allows zval to use the persistent string. This is not done in PHP5. A copy is copied to ZMM before use.
Reserved characters (interned strings) are a bit special. They will exist until the end of the request and are destroyed, so there is no need for reference counting. Reserved strings are also non-duplicate, so when creating a new reserved character, you will first check whether the same character already exists. All immutable strings in PHP source code are reserved characters (including string constants, variable names, function names, etc.). The persistent string is also a reserved character that has been created before the request is started. However, ordinary reserved characters will be destroyed after the request ends, but persistent strings will always exist.
If opcache is used, reserved characters will be stored in shared memory (SHM) so that they can be shared among all PHP processes. In this case, the persistent string has no meaning, because reserved characters will not be destroyed.
Array
Because the previous article has talked about the new array implementation, I will not describe it in detail here. Although some recent changes have made the previous description less accurate, the basic concepts are still the same.
What I want to talk about here is an array-related concept that was not mentioned in the previous article: immutable array. They are essentially similar to reserved characters: they have no reference count and exist until the request ends (and may also exist after the request ends).
For some reasons of memory management convenience, immutable arrays will only be used when opcache is enabled. Let’s take a look at actual usage examples, starting with the following script:
for ($i = 0; $i 1000000; $i) {
$array[] = ['foo'];
}
var_dump(memory_get_usage());
When opcache is turned on, the above code will use 32MB of memory. If it is not turned on, because each element of $array will copy ['foo'], it will require 390MB. The reason why a complete copy is performed instead of increasing the reference count value is to prevent shared memory errors when the zend virtual machine operator is executed. I hope the problem of memory explosion when not using opcache can be improved in the future.
Objects in PHP5
Before understanding the object implementation in PHP7, let’s first take a look at PHP5 and see what efficiency issues there are. zval in PHP5 will store a zend_object_value structure, which is defined as follows:
typedef struct _zend_object_value {
zend_object_handle handle;
Const zend_object_handlers *handlers;
} zend_object_value;
handle is the unique ID of the object and can be used to look up object data. handles is a virtual function table pointer that saves various attribute methods of the object. Normally, PHP objects have the same handler table, but objects created by PHP extensions can also customize their behavior through operator overloading and other methods.
The object handle (handler) is used as an index for "object storage". The object storage itself is an array of storage containers (bucket). The bucket is defined as follows:
typedef struct _zend_object_store_bucket {
zend_bool destructor_called;
zend_bool valid;
zend_uchar apply_count;
Union _store_bucket {
struct _store_object {
void *object;
zend_objects_store_dtor_t dtor;
zend_objects_free_object_storage_t free_storage;
zend_objects_store_clone_t clone;
const zend_object_handlers *handlers;
zend_uint refcount;
gc_root_buffer *buffered;
} obj;
struct {
int next;
} free_list;
} bucket;
} zend_object_store_bucket;
This structure contains many things. The first three members are just plain metadata (whether the object's destructor has been called, whether the bucket has been used, and how many times the object has been called recursively). The following union is used to distinguish whether the bucket is in use or idle. The most important of the above structures is the struct _store_object substructure:
The first member, object, is a pointer to the actual object (that is, where the object is ultimately stored). The object is not actually embedded directly into the object storage bucket because the object is not of fixed length. Below the object pointer are three operation handles (handlers) used to manage object destruction, release and cloning. What should be noted here is that PHP destroying and releasing objects are different steps, and the former may be skipped (incomplete release) in some cases. The cloning operation is actually almost never used because the operations involved are not part of the normal object itself, so (anytime) they are duplicated in each object instead of being shared.
These object storage operation handles are followed by an ordinary object handlers pointer. These data are stored because sometimes the object may be destroyed when the zval is unknown (usually these operations are performed on the zval).
The bucket also contains a refcount field, but this behavior is a bit strange in PHP5 because the zval itself already stores the reference count. Why do we need an extra count? The problem is that although usually the "copy" behavior of zval is to simply increase the reference count, occasionally deep copying occurs, such as creating a new zval but saving the same zend_object_value. In this case, two different zvals use the same object storage bucket, so the bucket itself also needs to be reference counted. This "double counting" method is an inherent problem in the implementation of PHP5. The buffered pointer in the GC root buffer also needs to be fully copied for the same reason.
Now look at the structure of the actual object pointed to by the pointer in the object storage. Usually the user-level object is defined as follows:
typedef struct _zend_object {
zend_class_entry *ce;
HashTable *properties;
zval **properties_table;
HashTable *guards;
} zend_object;
The zend_class_entry pointer points to the class prototype implemented by the object. The next two elements are different ways of storing object properties. Dynamic properties (those added at runtime rather than defined in the class) all exist in properties, but are just simple matching of property names and values.
But there is an optimization for declared properties: each property is assigned an index during compilation and the property itself is stored in the index of properties_table. Matches of property names and indexes are stored in the class prototype's hashtable. This prevents each object from using more memory than the hashtable's limit, and the property's index is cached in multiple places at runtime.
The hash table of guards is used to implement the recursive behavior of magic methods, such as __get, which we will not discuss in depth here.
In addition to the double counting problem mentioned above, another problem with this implementation is that a minimal object with only one attribute also requires 136 bytes of memory (this does not include the memory required by zval). And there are many indirect access actions in the middle: for example, to take out an element from the object zval, you first need to take out the object storage bucket, then zend object, and then you can find the object attribute table and zval through the pointer. So there are at least four levels of indirection here (and in practice a minimum of seven may be required).
Objects in PHP7
The implementation of PHP7 attempts to solve the above problems, including removing double reference counting, reducing memory usage and indirect access. The new zend_object structure is as follows:
struct _zend_object {
zend_refcounted gc;
uint32_t handle;
zend_class_entry *ce;
Const zend_object_handlers *handlers;
HashTable *properties;
zval properties_table[1];
};
You can see that this structure is now almost the entire content of an object: zend_object_value has been replaced by a pointer directly pointing to the object and object storage. Although it has not been completely removed, it is already a big improvement.
In addition to the usual zend_refcounted header in PHP7, handles and object handlers are now placed in zend_object. The properties_table here also uses a little trick of the C structure, so that zend_object and the property table will get a whole block of memory. Of course, now the attribute table is embedded directly into the zval instead of a pointer.
Now there is no guards table in the object structure. Now the value of this field will be stored in the first bit of properties_table if needed, that is, when using methods such as __get. However, if no magic method is used, the guards table will be omitted.
The three operation handles of dtor, free_storage and clone were previously stored in the object operation bucket. Now they are directly stored in the handlers table. Their structures are defined as follows:
struct _zend_object_handlers {
/* offset of real object header (usually zero) */
int offset;
/* general object functions */
zend_object_free_obj_t free_obj;
zend_object_dtor_obj_t dtor_obj;
zend_object_clone_obj_t clone_obj;
/* individual object functions */
// ... rest is about the same in PHP 5
};
The first member of the handler table is offset, which is obviously not an operation handle. This offset must exist in the current implementation, because although the internal object is always embedded in the standard zend_object, there will always be a need to add some members to it. The way to solve this problem in PHP5 is to add some content after the standard object:
struct custom_object {
zend_object std;
uint32_t something;
// ...
};
This way if you can easily add zend_object* to struct custom_object*. This is also the common practice of structure inheritance in C language. However, there is a problem with this implementation in PHP7: because zend_object uses a structure hack technique when storing the attribute table, the PHP attributes stored at the end of zend_object will overwrite the internal members added later. Therefore, the implementation of PHP7 will add its own members to the front of the standard object structure:
struct custom_object {
uint32_t something;
// ...
zend_object std;
};
However, this also means that simple conversion between zend_object* and struct custom_object* cannot be performed directly, because both are separated by an offset. Therefore, this offset needs to be stored in the first element in the object handler table, so that the specific offset value can be determined through the offsetof() macro at compile time.
Maybe you are curious, since the pointer of zend_object has been stored directly (in zend_value), and there is no need to look for the object in the object storage, why does the object owner of PHP7 still retain the handle field?
This is because object storage still exists, albeit greatly simplified, so retaining handles is still necessary. Now it's just an array of pointers to objects. When an object is created, a pointer is inserted into the object's storage and its index is saved in the handle. When the object is released, the index is removed.
So why is object storage needed now? Because there will be a certain node at the end of the request, and it is not safe to execute user code and get pointer data after that. In order to avoid this situation, PHP will execute the destructor of all objects on an earlier node and there will no longer be such operations, so a list of active objects is needed.
And handle is also very useful for debugging, it gives each object a unique ID, so it is easy to distinguish whether two objects are the same or just have the same content. Although HHVM has no concept of object storage, it also stores object handles.
Compared with PHP5, the current implementation only has a reference count (zval itself does not count), and the memory usage has been greatly reduced: 40 bytes for the base object, 16 bytes per attribute, And this is after zval. The situation of indirect access has also been significantly improved, because now the middle-layer structure has either been removed or directly embedded, so now there is only one level of access to read a property instead of four.
Indirect zval
So far we have basically mentioned all the normal zval types, but there are also a pair of special types used in certain situations, one of which is the newly added IS_INDIRECT in PHP7.
Indirect zval means that its real value is stored elsewhere. Note that the IS_REFERENCE type is different. The indirect zval points directly to another zval instead of embedding the zval like the zend_reference structure.
In order to understand when this situation occurs, let's take a look at the implementation of variables in PHP (actually the same is true for the storage of object properties).
All variables known during compilation are assigned an index and their values are stored in the corresponding location in the compile variable (CV) table. But PHP also allows you to dynamically reference variables, whether they are local variables or global variables (such as $GLOBALS). Whenever this happens, PHP will create a symbol table for the script or function, which contains the variable names and their values. mapping relationship between them.
But the question is: How can we achieve simultaneous access to two tables? We need to be able to access normal variables in the CV table, and we need to be able to access compiled variables in the symbol table. In PHP5, the CV table uses double pointers zval**. Usually these pointers point to the middle zval* table, and zval* ultimately points to the actual zval:
------ CV_ptr_ptr[0]
| ---- CV_ptr_ptr[1]
| | -- CV_ptr_ptr[2]
| | |
| | -> CV_ptr[0] --> some zval
| ---> CV_ptr[1] --> some zval
-----> CV_ptr[2] --> some zval
When the symbol table is required, the intermediate table storing zval* is actually not used and the zval** pointer will be updated to the corresponding location of the hashtable buckets. We assume there are three variables $a, $b and $c. The following is a simple diagram:
CV_ptr_ptr[0] --> SymbolTable["a"].pDataPtr --> some zval
CV_ptr_ptr[1] --> SymbolTable["b"].pDataPtr --> some zval
CV_ptr_ptr[2] --> SymbolTable["c"].pDataPtr --> some zval
But this problem no longer exists in the usage of PHP7, because the hashtable bucket becomes invalid when the size of the hashtable in PHP7 changes. So PHP7 uses the opposite strategy: in order to access variables stored in the CV table, INDIRECT is stored in the symbol table to point to the CV table. The CV table will not be reallocated during the life cycle of the symbol table, so there will be no problem of invalid pointers.
So if you have a function with $a, $b, and $c in the CV table, and a dynamically allocated variable $d, the structure of the symbol table would look something like this:
SymbolTable["a"].value = INDIRECT --> CV[0] = LONG 42
SymbolTable["b"].value = INDIRECT --> CV[1] = DOUBLE 42.0
SymbolTable["c"].value = INDIRECT --> CV[2] = STRING --> zend_string("42")
SymbolTable["d"].value = ARRAY --> zend_array([4, 2])
The indirect zval can also be a pointer to a zval of type IS_UNDEF, which occurs when the hashtable does not have a key associated with it. Therefore, when unset($a) is used to mark the type of CV[0] as UNDEF, it will be determined that there is no data with the key value a in the symbol table.
Constants and AST
There are two special types that need to be mentioned, IS_CONSTANT and IS_CONSTANT_AST, which exist in both PHP5 and PHP7. To understand them, let’s first look at the following examples:
function test($a = ANSWER,
$b = ANSWER * ANSWER) {
Return $a $b;
}
define('ANSWER', 42);
var_dump(test()); // int(42 42 * 42)·
The default values of the two parameters of the test() function are composed of constant ANSWER, but the value of the constant is not yet defined when the function is declared. The exact value of a constant is only known when defined via define().
Due to the above issues, default values for parameters and properties, constants, and other things that accept "static expressions" support "delayed binding" until the first time they are used.
Constants (or static attributes of a class), data that require "delayed binding", are where the IS_CONSTANT type zval is most often used. If the value is an expression, a zval of type IS_CONSTANT_AST is used to point to the abstract syntax tree (AST) of the expression.
This concludes our analysis of variable implementation in PHP7. I may write two articles later to introduce some virtual machine optimization, new naming conventions, and optimization of some compiler infrastructure (these are the author's original words).