I am working on something these days. I am studying PHP to read files with a large number of lines (about millions of lines). Considering the efficiency issue, I conducted a simple study. The summary is as follows
The first issue is the efficiency of the file() function.
The efficiency of the file() function is very low. If it is a regular file, such as one piece of corresponding data per line, then try not to use the file() function
You can use file_get_contents() and then use explode to cut. This will make the efficiency one-third faster
For example:
The file format is as follows:
11111n
22222n
33333n
44444n
55555n
.....n
nnnnnnnnnnnn
If you use file($file) to read it, it will take a long time.
You can use the following method explode("n",file_get_contents($file)); the efficiency will be much faster.
Second, the way to traverse the array.
The data has been read into the array. The next step is to traverse.
What I need is to determine whether a value exists in the array, for example, whether 44444 is in the array. The first thing that comes to mind is in_array()
However, after experimenting, I found that the efficiency is very low. So I thought of a way by referring to other people's codes. Flip the array over so that all values are 1. The original values become indexes. Then as long as I write (in the if $arr[index]==1) to judge. Sure enough, the efficiency is much higher.
During the traversal process of the array. If the array is very large and not all the data in the array can be used, it is best to extract the array used for traversal. This will improve a lot of efficiency.
Article 3, storage of arrays.
Save the calculated data in a file. Three methods are considered. One is to write it directly into a php file. One is to serialize, and the other is a json string.
The first way
Write directly to the fileSave as PHP
Require directly when needed.
The second way. Serialize the variable and then file_put_contents() into the file. When using it, unserialize is ok.
The third method is similar to the second method. It is just written as a json string.
After testing, it was found that the second type is the most efficient, the third type is second, and is almost as efficient as the second type. The first type is the slowest. There is a big difference from what I expected. It is really surprising.