JavaScript array deduplication often appears in written test questions for front-end recruitment, for example:
There is an array var arr = ['a', 'b', 'c', '1', 0, 'c', 1, ' ', 1, 0], please use JavaScript to implement the deduplication function unqiue, so that unique(arr) returns ['a', 'b', 'c', '1', 0, 1, '']
as a written test Question, there are two test points:
1. Correct. Don’t underestimate this test point. Considering that JavaScript often runs on browsers, it is not an easy task to ensure the correctness of a function in various browser environments. If you don’t believe me, continue reading this blog.
2. Performance. Although in most cases the JavaScript language itself (in a narrow sense, excluding extensions such as DOM) will not cause performance problems, unfortunately this is an exam question, so interviewers will still take performance as a test point.
Before reading further, it is recommended to implement a version that you think is best.
Intuitive solution
For array deduplication, as long as you have written a program, you can get the first solution immediately:
function unique(arr) {
var ret = []
for (var i = 0; i < arr.length; i++) {
var item = arr[i]
if (ret.indexOf(item) === -1) {
ret.push(item)
}
}
return ret
}
Intuition is often very reliable. Under modern browsers, the above function is correct and performs well. But the biggest tragedy and challenge of the front-end is that it must support various operating environments. Under IE6-8, the indexOf method of arrays does not exist yet. The intuitive solution needs to be modified slightly:
var indexOf = [].indexOf ?
function(arr, item) {
} return arr.indexOf(item)
} :
function indexOf(arr, item) {
Return -1
}
function unique(arr) {
var ret = []
for (var i = 0; i < arr.length; i++) {
var item = arr[i]
if (indexOf(ret, item) === -1) {
ret.push(item)
}
}
return ret
}
At this point, the accuracy is gone Problem, but in terms of performance, the double loop will make the interviewers unhappy.
Optimization Plan
When it comes to optimization, it’s often like eight immortals crossing the sea and a hundred flowers blooming. But the Eight Immortals are often not down to earth, and the Hundred Flowers can easily attract bed bugs. The various optimization solutions for array deduplication will not be discussed one by one here. Here we only talk about the most commonly used one with very good results.
function unique(arr) {
var ret = []
var hash = {}
for (var i = 0; i < arr.length; i++) {
var item = arr[i]
var key = typeof(item) + item
if (hash[key] !== 1) {
ret.push(item)
hash[key] = 1
}
}
return ret
}
The core is to construct a hash object to replace indexOf. Note that in JavaScript, the key value of the object can only be a string, so var key = typeof(item) + item is required. Distinguish between the numerical value 1 and the string '1'.
But optimization is really easy to cause pitfalls. For example, in the above implementation, it is impossible to judge the following input:
1
unique([ new String(1), new Number(1) ])
Yes Continue to modify the code to achieve good performance and correctness. But often, the results are not good.
Real demand
After writing this, this blog can get to the point. Programmers all have some dreams in their hearts, such as writing universal functions with good general performance and good performance. This kind of dream is an important driving force for programmers to become excellent, but if it is not controlled, it can easily go astray.
Back to performance optimization. There are various optimizations these days, including core systems, databases, networks, front-ends, etc. All these optimizations must answer the following questions:
1. What is currently available. Under what scenario should optimization be performed, and what are the specific restrictions under the scenario. It’s important to sort out limitations, which often bring freedom.
2. What exactly do you want? What is the purpose of optimization. Whether to improve stability, increase throughput, or reduce user waiting time. Until this question is answered, optimization is in vain. An accurate answer to this question can bring specific measurable parameters to the optimization, so that the optimization has a goal.
3. What can you give up? You can't have your cake and eat it too. The essence of optimization is choices and trade-offs in specific scenarios. If you are unwilling to give up anything, optimization will often be difficult.
Writing this blog is not to answer the written test question. This written test question is a bit boring. The original driving force for writing this blog is that I have been doing performance tuning of SeaJS recently. One of the requirements is:
define(function(require, exports) {
var a = require('./a' )
var b = require('./b')
...
require('./a').fn(...)
})
The above is a module that passes the parsing function For strings, you can get the module's dependency array: ['./a', './b', './a']. This dependency information may have duplicate fields, so duplication must be removed.
For this specific scenario, let’s answer the above three questions:
1. What is currently available. There are some input restrictions, only strings need to be considered.
2. What exactly do you want? This problem is relatively simple. We hope that the unique method is as fast as possible. The indicator is to use the Profiles panel in the Chrome debugging tool to view the time consumption of the unique method in the specified test page. The target is within 5ms.
3. What can you give up? It only needs to process strings, other types are not supported. It is often interesting to talk about giving up. This issue is not so simple. Let’s talk about it next.
Solution under SeaJS
Once the specific scenario is clearly analyzed, the solution is relatively simple:
function unique(arr) {
var obj = {}
forEach(arr, function(item) ) {
obj[item] = 1
})
return keys(obj)
}
The above code relies on forEach and keys, and cannot be separated from the context environment (the environment is very important), complete Code: util-lang.js
The above solution is very good in terms of code size, correctness, and overall performance under various browsers.
Until one day such a test case appeared:
define(function(require, exports) {
var a = require('toString')
var b = require('hasOwnProperty')
.. .
})
"Perfect" solution
The above test case will call
unique([ 'toString', 'hasOwnProperty' ]) // Expect to return [ 'toString', 'hasOwnProperty' ]
IE has various bugs, here is a less famous but real one:
var obj = { toString: 1, hasOwnProperty: 1 }
for (var p in obj) {
console.log(p)
}
Under modern browsers, the above will output the two values correctly, but it will not be output under Old IE. This is an enumeration bug in IE: A safer Object.keys compatibility implementation The "perfect" solution is as follows:
var keys = Object.keys || (function () {
var hasOwnProperty = Object.prototype.hasOwnProperty ,
use using using ’ to ’toString' eString',
’ 'isPrototypeOf',
'propertyIsEnumerable',
'constructor'
],
DontEnumsLength = DontEnums.length;
return function (o) {
if (typeof o != "object" && typeof o != "function" || o === null)
throw new TypeError("Object.keys called on a non-object"); o) {
numBug) {
for (var i = 0 ; i < DontEnumsLength; i++) {
}
} }
return result;
};
})();
In addition to the DontEnums array, you can also pay special attention to the processing of hasOwnProperty. **For the front end, it is not easy to ensure "correctness". **