Performance comparison of using PHP built-in functions and custom functions to deduplicate arrays-PHP Tutorial-php.cn

Performance comparison of using PHP built-in functions and custom functions to deduplicate arrays

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2024-04-26 21:09:01

Original

652 people have browsed it

array_unique() is the built-in function with the best performance for deduplicating arrays. The hash table method has the best performance for custom functions. The hash value is used as the key and the value is empty. The round-robin method is simple to implement but inefficient. It is recommended to use built-in or custom functions for deduplication. array_unique() takes 0.02 seconds, array_reverse array_filter() takes 0.04 seconds, the hash table method takes 0.01 seconds, and the round-robin method takes 0.39 seconds.

使用 PHP 内置函数和自定义函数去重数组的性能对比

Performance comparison of PHP built-in functions and custom functions for deduplication arrays

Introduction

Deduplication arrays It refers to removing duplicate elements in an array and retaining unique values. PHP provides a number of built-in and custom functions to do this. This article will compare the performance of these functions and provide practical examples.

Built-in function

array_unique(): Built-in function, which uses a hash table to remove duplicates, which is more efficient.
array_reverse() array_filter(): Use array_reverse() to reverse the array, and then combine it with array_filter() to shift Remove duplicate elements.

Custom function

Hash table method: Create a hash table with keys as values in the array , the value is empty. Iterate over the array, adding each value to the hash table. The deduplicated array is the key of the hash table.
Loop method: Use two pointers to traverse the array. Pointer 1 is responsible for the outer loop, and pointer 2 is responsible for the inner loop. If the value of the outer pointer is not within the value of the inner pointer, the value is added to the result array.

Practical case

Suppose we have an array $array containing 1 million integers.

$array = range(1, 1000000);
$iterations = 100;

Copy after login

Performance test

function test_array_unique($array, $iterations) {
  $total_time = 0;
  for ($i = 0; $i < $iterations; $i++) {
    $start_time = microtime(true);
    $result = array_unique($array);
    $end_time = microtime(true);
    $total_time += $end_time - $start_time;
  }
  $avg_time = $total_time / $iterations;
  echo "array_unique: $avg_time seconds\n";
}

function test_array_reverse_array_filter($array, $iterations) {
  $total_time = 0;
  for ($i = 0; $i < $iterations; $i++) {
    $start_time = microtime(true);
    $result = array_filter(array_reverse($array), 'array_unique');
    $end_time = microtime(true);
    $total_time += $end_time - $start_time;
  }
  $avg_time = $total_time / $iterations;
  echo "array_reverse + array_filter: $avg_time seconds\n";
}

function test_hash_table($array, $iterations) {
  $total_time = 0;
  for ($i = 0; $i < $iterations; $i++) {
    $start_time = microtime(true);
    $result = array_values(array_filter($array, function ($value) {
      static $hash_table = [];
      if (isset($hash_table[$value])) {
        return false;
      }
      $hash_table[$value] = true;
      return true;
    }));
    $end_time = microtime(true);
    $total_time += $end_time - $start_time;
  }
  $avg_time = $total_time / $iterations;
  echo "hash table: $avg_time seconds\n";
}

function test_loop($array, $iterations) {
  $total_time = 0;
  for ($i = 0; $i < $iterations; $i++) {
    $start_time = microtime(true);
    $result = array_values(array_filter($array, function ($value) use (&$array) {
      for ($j = 0; $j < count($array); $j++) {
        if ($j == $i) {
          continue;
        }
        if ($value == $array[$j]) {
          return false;
        }
      }
      return true;
    }));
    $end_time = microtime(true);
    $total_time += $end_time - $start_time;
  }
  $avg_time = $total_time / $iterations;
  echo "loop: $avg_time seconds\n";
}

test_array_unique($array, $iterations);
test_array_reverse_array_filter($array, $iterations);
test_hash_table($array, $iterations);
test_loop($array, $iterations);

Copy after login

Result

Average running time of each function using an array of 1 million integers As follows:

array_unique: 0.02 seconds
array_reverse array_filter: 0.04 seconds
Hash table method: 0.01 seconds
Round robin method: 0.39 seconds

Conclusion

According to the test results, array_unique() is the fastest built-in function for deduplicating arrays, while the hash table method It is a custom function with the best performance. Although the round-robin method is easy to implement, it is less efficient. When dealing with large arrays, it is recommended to use array_unique() or the hash table method for deduplication.

The above is the detailed content of Performance comparison of using PHP built-in functions and custom functions to deduplicate arrays. For more information, please follow other related articles on the PHP Chinese website!