PHP is a weakly typed language. Such features inevitably require seamless and transparent implicit typing. Conversion, PHP uses zval internally to save any type of value. The structure of zval is as follows (5.2 as an example):
struct _zval_struct { /* Variable information */ zvalue_value value; /* value */ zend_uint refcount; zend_uchar type; /* active type */ zend_uchar is_ref; };
In the above structure, the value itself is actually stored in the zvalue_value union:
typedef union _zvalue_value { long lval; /* long value */ double dval; /* double value */ struct { char *val; int len; } str; HashTable *ht; /* hash table value */ zend_object_value obj; } zvalue_value;
Today’s topic, we only focus on two of them, lval and dval. We need to realize that long lval has an uncertain length depending on the compiler and OS word length. It may be 32bits or 64bits. , and double dval (double precision) is specified by IEEE 754, is fixed length, must be 64bits.
Please keep this in mind, which makes some PHP code "non-platform independent". Our following discussion, unless otherwise specified, assumes that long is 64bits
I won’t quote the floating point counting method of IEEE 754 here. If you are interested, you can check it out for yourself. The key point is that the mantissa of double is stored in 52 bits, including the hidden 1 significant bit. A total of 53 bits.
Here, a very interesting question arises. Let’s use c code as an example (assuming long is 64bits):
long a = x; assert(a == (long)(double)a);
Excuse me, when the value of a is within what range, can the above code be asserted to be successful? (Leave the answer at the end of the article)
Now we return to the topic. Before executing a script, PHP first needs to read the script and analyze the script. This process also includes zvalizing the literals in the script. For example, for the following script:
<?php $a = 9223372036854775807; //64位有符号数最大值 $b = 9223372036854775808; //最大值+1 var_dump($a); var_dump($b);
Output:
int(9223372036854775807) float(9.22337203685E+18)
In other words, during the lexical analysis stage, PHP will judge whether a literal value exceeds the long table value range of the current system. If not, lval will be used to save it, and zval will be IS_LONG. Otherwise, Just use dval to represent it, zval IS_FLOAT.
We must be careful with any value larger than the largest integer value, because it may cause loss of accuracy:
<?php $a = 9223372036854775807; $b = 9223372036854775808; var_dump($a === ($b - 1));
The output is false.
Now continuing the discussion at the beginning, as mentioned before, PHP's integers may be 32-bit or 64-bit, so it is decided that some codes that can run normally on 64-bit may be invisible due to Type conversion causes precision loss, causing the code to not run properly on 32-bit systems.
So, we must be wary of this critical value. Fortunately, this critical value has been defined in PHP:
<?php echo PHP_INT_MAX; ?>
Of course, to be on the safe side, we should use strings to store large integers, and use mathematical function libraries such as bcmath to perform calculations.
In addition, there is another key configuration that will confuse us. This configuration is php.precision. This configuration determines how many significant digits PHP outputs when it outputs a float value.
Finally, let’s look back at the question raised above, that is, what is the maximum value of a long integer to ensure that there will be no loss of precision after converting to float and then back to long?
For example, for an integer, we know that its binary representation is, 101. Now, let us right shift two bits to become 1.01, discard the implicit significant bit 1 of the high bit, and we get the binary representation of 5 stored in double The value is:
0/*Sign bit*/ 10000000001/*Exponent bit*/ 010000000000000000000000000000000000000000000000000
The binary representation of 5 is stored in the mantissa part without any loss. In this case, there will be no loss of precision when converting from double back to long.
We know that double uses 52 bits to represent the mantissa. Counting the implicit first 1, the total is 53 bits of precision. Then it can be concluded that if a long integer, the value is less than:
2^53 - 1 == 9007199254740991; //Keep in mind, we now assume that it is a 64bits long
Then, this integer will not lose precision when the long->double->long value conversion occurs.
Regarding floating point numbers, there is another point, which is the answer to the following common question:
<?php $f = 0.58; var_dump(intval($f * 100)); //为啥输出57 ?>
Why is the output 57? Is it a PHP bug?
I believe that many students have had such questions, because there are many people asking me similar questions, not to mention that people often ask questions on bugs.php.net...
To understand this reason, we first need to know the representation of floating point numbers (IEEE 754):
Floating point numbers, taking 64-bit length (double precision) as an example, will be represented by 1 sign bit (E), 11 exponent bits (Q), and 52-bit mantissa (M) (a total of 64 bits).
Sign bit: The highest bit represents the sign of the data, 0 represents a positive number, and 1 represents a negative number.
Exponent bit: indicates the data raised to the power of base 2, and the exponent is represented by an offset code
Mantissa: Indicates the significant digits after the decimal point of the data.
The key point here is the representation of decimals in binary. As for how decimals are represented in binary, you can search on Baidu. I won’t go into details here. The key thing we need to understand is that for binary representation, 0.58 is infinite. Long values (numbers below omit the implicit 1)..
The binary representation of 0.58 is basically (52 bits): 0010100011110101110000101000111101011100001010001111
The binary representation of 0.57 is basically (52 bits): 0010001111010111000010100011110101110000101000111101
And the binary numbers of the two, if calculated only through these 52 bits, are:
0.58 -> 0.57999999999999996
0.57 -> 0.56999999999999995
As for the specific floating-point number multiplication of 0.58 * 100, we will not consider it in detail. Those who are interested can look at it (Floating point), we will look at it vaguely through mental arithmetic... 0.58 * 100 = 57.999999999
Then if you intval it, it will naturally be 57…
It can be seen that the key point of this problem is: "Your seemingly finite decimal is actually infinite in the binary representation of the computer"
so, don’t think this is a PHP bug anymore, this is what it is…