For those new to it PHP (starting from PHP 3.05) provides a set of serialization and deserialization functions for saving objects: serialize, unserialize. However, the description of these two functions in the PHP manual is limited to how to use them, but there is no description of the format of the serialized results.
Therefore, it is more troublesome to implement PHP serialization format in other languages. Although I have collected some PHP serialization programs implemented in other languages before, these implementations are not complete. When serializing or deserializing some more complex objects, errors will occur.
So I decided to write a document about the detailed explanation of PHP serialization format (that is, this document), so that I can have a more complete reference when writing PHP serialization programs implemented in other languages.
The content written in this article was obtained by writing programs to test and reading PHP source code. Therefore, I cannot 100% guarantee that all the content is correct, but I will try my best to ensure that I Regarding the correctness of what I have written, I will clearly point out the parts that I am not sure about in the article, and I hope that everyone can supplement and improve them.
The content after PHP serialization format is a simple text format, but it is sensitive to letter case and whitespace (spaces, carriage returns, line feeds, etc.), and the string is in bytes (or 8 bits) characters), therefore, it is more appropriate to say that the content serialized by PHP is in byte stream format.
Therefore, when implemented in other languages, if the strings in the implemented language are not in byte storage format, but in Unicode storage format, the serialized content is not suitable to be saved as a string, but should be saved It should be a byte stream object or byte array, otherwise an error will occur when exchanging data with PHP.
PHP uses different letters to mark different types of data. The article Using Serialized PHP with Yahoo! Web Services provided by the Yahoo development website gives all the letter marks and their meanings:
a - array
b - boolean
d - double
i - integer
o - common object
r - reference
s - string
C - custom object
O - class
N - null
R - pointer reference
U - unicode string
N represents NULL, while b, d, i, s represent four scalar types Currently, PHP serialization format programs implemented in other languages basically implement serialization and deserialization of these types, but there are problems with the implementation of s (string) in some implementations.
a and O are the most commonly used composite types. Most implementations in other languages have well implemented the serialization and deserialization of a, but only the object serialization format in PHP4 is implemented for O. , without providing support for the extended object serialization format in PHP 5.
r and R respectively represent object reference and pointer reference. These two are also more useful. Data with these two marks will be generated when serializing more complex arrays and objects. We will detail it later. Explain these two flags. Currently, no implementation of these two flags in other languages has been found.
C was introduced in PHP5. It represents a custom object serialization method. Although this is not necessary for other languages because it is rarely used, it will still be discussed later. Explained in detail.
U was introduced in PHP6 and represents a Unicode-encoded string. Because PHP6 provides the ability to save strings in Unicode mode, it provides this PHP serialization format string format. However, this type is not supported by PHP5 or PHP4, and these two versions are currently mainstream, so in other When the language implements this type, it is not recommended to use it for serialization, but its deserialization process can be implemented. I will also explain its format later.
There is an o at the end, which is the only data type indicator that I haven’t figured out yet. This flag was introduced in PHP3 to serialize objects, but was replaced by O in PHP4. In the source code of PHP3, you can see that the serialization and deserialization of o are basically the same as the array a. However, it cannot be found in the PHP serialization format part of the source code of PHP4, PHP5 and PHP6, but it is processed in the source code of these versions of deserializers, but it is I haven't figured out what to do with it yet. Therefore no further explanation will be given for the time being.