A question that front-end interviewers must prepare for: How to remove duplicates from Javascript Array. As far as I know, Baidu, Tencent, Shanda, etc. have all asked this question in interviews. This question seems simple, but it actually contains hidden dangers. The test is not only about realizing this function, but also about your in-depth understanding of computer program execution.
I came up with a total of three algorithms to achieve this purpose:
Array.prototype.unique1 = function() { var n = []; //一个新的临时数组 for(var i = 0; i < this.length; i++) //遍历当前数组 { //如果当前数组的第i已经保存进了临时数组,那么跳过, //否则把当前项push到临时数组里面 if (n.indexOf(this[i]) == -1) n.push(this[i]); } return n; } Array.prototype.unique2 = function() { var n = {},r=[]; //n为hash表,r为临时数组 for(var i = 0; i < this.length; i++) //遍历当前数组 { if (!n[this[i]]) //如果hash表中没有当前项 { n[this[i]] = true; //存入hash表 r.push(this[i]); //把当前数组的当前项push到临时数组里面 } } return r; } Array.prototype.unique3 = function() { var n = [this[0]]; //结果数组 for(var i = 1; i < this.length; i++) //从第二项开始遍历 { //如果当前数组的第i项在当前数组中第一次出现的位置不是i, //那么表示第i项是重复的,忽略掉。否则存入结果数组 if (this.indexOf(this[i]) == i) n.push(this[i]); } return n; }
The first and third methods both use the indexOf method of the array. The purpose of this method is to find the first occurrence of the stored parameter in the array. Obviously, the js engine will traverse the array until it finds the target when implementing this method. So this function wastes a lot of time. The second method uses a hash table. Store the occurrences in an object in the form of subscripts. Subscripted references are much faster than searching the array using indexOf.
In order to judge the efficiency of these three methods, I made a test program to generate an array of random numbers with a length of 10,000, and then used several methods to test the execution time. The results show that the second method is much faster than the other two methods. However, in terms of memory usage, the second method is more likely to be used because there is an additional hash table. This is what is called space for time. This is the test page, you can also check it out.
According to the ideas of hpl experts, I wrote the fourth method:
Array.prototype.unique4 = function() { this.sort(); var re=[this[0]]; for(var i = 1; i < this.length; i++) { if( this[i] !== re[re.length-1]) { re.push(this[i]); } } return re; }
The idea of this method is to sort the array first, and then compare two adjacent values. The JS native sort method is used when sorting. The JS engine should use quick sort internally. The final test result is that the running time of this method is about three times that of the second method on average, but it is much faster than the first and third methods.
The fifth method
I recently used the [Search History] function and started using the indexOf method. This method is only supported in ECMA5, but is not supported in IE8-.
We can write a function ourselves (the methods of the Array object are all defined on the prototype object), as follows:
Array.prototype.unique = function(){ var length = this.length; if(length <= 1){ return this; } if(!Array.prototype.indexOf){ Array.prototype.indexOf = function(item){ var l = this.length, i = 0, r = -1; if(l <= 0){ return -1; } for(; i < l; i++){ if(this[i] === item){ r = i; } } return r; } } var result = []; //去重数组 for(var i = 0; i < length; i++){ if(result.indexOf(this[i]) === -1){ result.push(this[i]); } } return result; }
The sixth method
The Array type does not provide a deduplication method. If you want to remove duplicate elements from the array, you have to find a way yourself:
function unique(arr) { var result = [], isRepeated; for (var i = 0, len = arr.length; i < len; i++) { isRepeated = false; for (var j = 0, len = result.length; j < len; j++) { if (arr[i] == result[j]) { isRepeated = true; break; } } if (!isRepeated) { result.push(arr[i]); } } return result; }
The general idea is to transfer the array elements to another array one by one. During the transfer process, check whether the element is duplicated, and if so, discard it directly. As can be seen from nested loops, this method is extremely inefficient. We can use a hashtable structure to record existing elements, so that the inner loop can be avoided.