I have been working on something these days. I am studying how PHP can read files with a large number of lines (about millions of lines). Considering the efficiency issue, I have conducted a simple study. The summary is as follows
Article 1. Efficiency issues of the file() function.
The efficiency of the file() function is very low. If it is a regular file, such as one piece of corresponding data per line, then try not to use the file() function
You can use file_get_contents() and then use explode to cut. This will be one third faster
For example:
The file format is as follows:
11111n
22222n
33333n
44444n
55555n
...n
nnnnnnnnnnnn
If you use file ($file) to read it, it will take a long time.
You can use the following method explode("n",file_get_contents($file)); the efficiency will be much faster.
Article 2, how to traverse arrays.
The data has been read into the array. The following is the traversal.
What I need is to determine whether a value exists in the array, for example, whether 44444 is in the array. The first thing that comes to mind is in_array()
However, after experimenting, I found that the efficiency is very low. So I thought of a way by referring to other people's codes. Flip the array over so that all values are 1. The original values become indexes. Then as long as I write ( $arr[index]==1) to judge. Sure enough, the efficiency is much higher.
During the traversal process of the array. If the array is very large and not all the data in the array can be used, it is best to extract the array used for traversal. This will improve a lot of efficiency.
Article 3, storage of arrays.
Save the calculated data in a file. Three methods are considered. One is to write it directly into a php file. One is to serialize, and the other is a json string.
The first way
Directly write the file and save it as PHP
Directly require when needed.
The second method is to serialize the variable and then file_put_contents() into the file. When using it, unserialize is ok.
The third method is similar to the second method. It is just written as a json string.
After testing, it was found that the second type is the most efficient, the third type is second, and the efficiency of the second type is almost the same. The first type is the slowest. It is very different from what I expected. It is really surprising.