How to use php arrays to sort millions of data
In my daily work, I often receive notifications to send group messages to website members through in-site letters, mobile phone text messages, and emails. The user list is usually provided by other colleagues, and there will inevitably be duplication. In order to avoid repeated sending , so I need to deduplicate the user list they provide before sending information. Next, I will use the uid list to talk about how I use the php array to deduplicate.
If you get a uid list with more than one million rows, the format is as follows:
10001000
10001001
10001002
............
10001000
............
10001111
In fact, it is easy to use the characteristics of PHP arrays to eliminate duplicates. Let’s first take a look at the definition of PHP arrays: The array in PHP is actually an ordered mapping. A map is a type that associates values to keys. This type is optimized in many ways, so it can be treated as a real array, or a list (vector), a hash table (an implementation of a map), a dictionary, a set, a stack, a queue, and many more possibilities. The value of an array element can also be another array. Tree structures and multidimensional arrays are also allowed.
In PHP arrays, keys are also called indexes and are unique. We can use this feature to perform deduplication. The sample code is as follows:
<?php //定义一个数组,用于存放排重后的结果 $result = array(); //读取uid列表文件 $fp = fopen('test.txt', 'r'); while(!feof($fp)) { $uid = fgets($fp); $uid = trim($uid); $uid = trim($uid, "r"); $uid = trim($uid, "n"); if($uid == '') { continue; } //以uid为key去看该值是否存在 if(empty($result[$uid])) { $result[$uid] = 1; } } fclose($fp); //将结果保存到文件 $content = ''; foreach($result as $k => $v) { $content .= $k."n"; } $fp = fopen('result.txt', 'w'); fwrite($fp, $content); fclose($fp); ?> Copy after login |
Also, this method can also be used to deduplicate two files. If you have two uid list files, the format is the same as the uid list above. The sample program is as follows:
<p><table cellspacing="0" cellpadding="6" width="95%" align="center" border="0" style="border-right: #0099cc 1px solid; table-layout: fixed; border-top: #0099cc 1px solid; border-left: #0099cc 1px solid; border-bottom: #0099cc 1px solid"><tbody><tr><td bgcolor="#ddedfb" style="word-wrap: break-word"><pre class="code"> <?php //定义数组,用于存放排重后的结果 $result = array(); //读取第一个uid列表文件,放入$result_1 $fp = fopen('test_1.txt', 'r'); while(!feof($fp)) { $uid = fgets($fp); $uid = trim($uid); $uid = trim($uid, "r"); $uid = trim($uid, "n"); if($uid == '') { continue; } //以uid为key写入$result,如有重复就会覆盖 $result[$uid] = 1; } fclose($fp); //读取第二个uid列表文件,并进行排重操作 $fp = fopen('test_2.txt', 'r'); while(!feof($fp)) { $uid = fgets($fp); $uid = trim($uid); $uid = trim($uid, "r"); $uid = trim($uid, "n"); if($uid == '') { continue; } //以uid为key去看该值是否存在 if(empty($result[$uid])) { $result[$uid] = 1; } } fclose($fp); //$result里保存的就排重以后的结果,可以输出到文件,代码省略 ?>
If you think about it carefully, it is not difficult to find that using this feature of arrays can solve more problems in our work.