In projects, we often find this requirement, which requires loading some large fixed formatted data, such as some skill data, items, etc. in battles. These data are read-only data and may be relatively large. Currently, there are about tens of thousands of complex data. If serialized, the plain text will be about 20M. I tried putting an array directly in the php file, but found that the require file is very time-consuming, and may take dozens of ms, and the IO is very heavy at this time, because dozens of m of data need to be loaded into the memory; I also did some research. After looking at SQLite, this thing is relatively reliable, but the problem is that writing operation functions, for example, is very uncomfortable to use; so I came up with the idea of writing an extension myself. So the tossing journey began.
What I originally thought was to call the zend_execute_script method directly in MINIT to load a php file and return a zval to store in a global variable. As a result, after careful consideration, I found out that it was simply a delusion. The reason is that the php vm has not been initialized during MINIT, so it is impossible for you to call the zend_execute_script method, and this method will not return a zval. If you want to get the zval, you must get it from the EG, which is very troublesome.
So I changed my mind and tried to use unserialize/serialize. It turned out that php_var_unserialize can indeed be called during the MINIT stage. So let's start, call this method to get a zval, then store it in a global variable, and return this zval in the get method. After writing it, I found out during testing that core will be executed as long as it is called. So I checked the documentation, thought about it myself, and finally found that the PHP_RSHUTDOWN_FUNCTION function will clear all non-pealloc-allocated variables. Therefore, the data that was still normal in the MINIT stage has been freed in the Request stage.
So I checked the documentation again and found that PHP provides functions such as pealloc to provide persistent data allocation. So I changed my mind and used pealloc to allocate the hashtable in the global variable, and set the hastable to persistent (thankfully, PHP's hashtable also stores code and vm, so it has this function). But the problem is that php_unserialize will only return a zval, and you have no control over whether it is persistent. There is no other way but to call zend_hash_copy. After writing it, I tested it again and found that it was still core. I don’t understand why. While eating at noon, I suddenly thought that it might be a shallow copy problem. zend_hash_copy provides a copy function but I did not set it. After adding the deep copy function and then testing it, I found that it works and it is very refreshing to use. www.2cto.com
The next test found that the memory usage was intolerable. Loading a 20m data file into the memory requires about 100m of memory. If there are 100 php-cgi processes, an extra 10G of memory is required, which is simply unbearable. So I imagined that we could use shared memory to solve this problem. Anyway, as long as this part of the data can be read, it will be fine. The main process of php-cgi is responsible for the MINIT operation, and the child process only needs to read this part of the data. But the very troublesome thing is that PHP does not provide any interface for users to maintain memory, so they can only do it one function at a time.
I took a closer look at the hashtable implementation of PHP and found that it is quite complicated, and the realloc function is key. This is so speechless. I can't write a memory management. Currently, only shared memory is used to implement a simple thread allocation function, which allocates space sequentially from the shared memory. But fortunately, this part of the function is not needed at all for the resize function. Because my goal is to copy the zval obtained in php_var_unserialize to shared memory, and I obviously already know the size. And there is no need for the updatea function, because it is a brand new copy. After finally finishing it, I found that it can be used, and the memory usage has indeed dropped.
Next, I performed a stress test and suddenly found that core started again. This is simply unbearable. Why? According to the core file, it was found that the refcount of the hashtable inside dropped to 0. So through various tests, it was found that it was ok in single-threaded situations, but would only hang under multi-threaded conditions under high pressure. So I thought that refcount will be modified, and if it is modified by multiple threads, it must be messed up. What to do? It can't be locked.
Later, I thought about it carefully, and suddenly I thought that as long as I modify the refcount of the top-level zval to a value greater than the number of php-cgi processes every time I return this zval, it will not be a problem even if it gets messed up, because it will not change at all. to 0. So I modified it and tested it again, and found that it was indeed reliable.
At this point, the entire problem is basically solved. But there is another problem. Core will still occur when restarting Php-cgi. The reason is that some variables being used were forced to be written to 0. In fact, the correct usage of shared memory is that one process writes and another process reads. However, in my application, the shared memory is used as an absolute address, so it is impossible to write in one place and read in other places, unless in shmat The second parameter is changed to a fixed value, but this requires a full understanding of the address allocation of the process and knowing which memory cannot be used at all. But this should be okay, because the Php-cgi process has a memory limit, so you should be able to find a piece of memory that cannot be used during the running of php-cgi. However, the specific situation must be studied in detail next.
Author Wu Xinyun