In Javascript, functions can be easily serialized (stringified), that is, the source code of the function can be obtained. But in fact, the internal implementation (engine implementation) of this operation is not as simple as you think. A total of two are used in SpiderMonkey There are three function serialization technologies: one is to use a decompiler to decompile the compiled bytecode of the function into a source code string, and the other is to compress the function source code before compiling the function into bytecode. And store it, then decompress and restore it when used.
How to serialize functions
In SpiderMonkey, there are three methods or functions that can serialize functions: Function. prototype.toString, Function.prototype.toSource, uneval. Only the toString method is standard, that is, it is common to all engines. However, there are only a few words about the Function.prototype.toString method in the ES standard (ES5 15.3.4.2). In other words, there is basically no standard, and the engine decides how to implement it.
The role of function serialization
The main role of function serialization should be to use the functions generated by serialization source code to redefine this function.
function a() {
...
alert("a")
...
}
a() //"a" may pop up during execution
a = eval("(" a.toString().replace('alert("a")', 'alert("b")') ")")
a() //When executing "b" may pop up
You may be thinking: "I have been writing Javascript for so many years, why haven't I encountered this kind of demand?" Indeed, if it is your own website, you have complete control over the js File, there is no need to modify the function in this patching method, just modify it directly. But if the source file is not under your control, it is very likely to be done like this. For example, the greasemonkey script is commonly used: You may need to disable or modify a function in a website. There are also Firefox extensions: you need to modify a certain function of Firefox itself (it can be said that Firefox is written in JS). Here is an example I wrote myself
Example of Firefox script:
location == "chrome:/ /browser/content/browser.xul" && eval("gURLBar.handleCommand=" gURLBar.handleCommand.toString().replace(/^s*(load. );/gm, "/^javascript:/.test(url )||(content.location=='about:blank'||content.location=='about:newtab')?$1:gBrowser.loadOneTab(url,{postData:postData,inBackground:false, allowThirdPartyFixup: true}) ;"))
The function of this code is: when pressing Enter on the address bar, let Firefox open the page in a new tab instead of occupying the current tab. The way to achieve this is to use the toString method to read gURLBar. The source code of the handleCommand function is then replaced with regular expressions and passed to eval to redefine the function.
Why not define it directly, that is, rewrite the function directly:
gURLBar.handleCommand = function(){...//Change the original function in a small place}
The reason why we cannot do this is because we have to consider compatibility. We should change the source code of this function as little as possible. If so As written, once the source code of Firefox's gURLBar.handleCommand changes, this script will become invalid. For example, both Firefox3 and Firefox4 have this function, but the content of the function is very different. However, if you use regular expressions to replace some keywords, as long as this is replaced If this keyword does not change, there will be no incompatibility.
Decompile bytecode In SpiderMonkey, the function will be compiled after being parsed into bytecode (bytecode), that is to say, the original function source code is stored in the memory. There is a decompiler in SpiderMonkey, and its main function is to decompile the bytecode of the function into the form of function source code.
In Firefox16 and previous versions, SpiderMonkey uses this method. If you are using these versions of Firefox, you can try the following code:
alert(function () {
"String";
//Comment
return 1 2 3
}.toString())
The returned string is
function () {
return 6;
}
The output is completely different from other browsers:
1. Meaningless primitive value literals will be deleted during compilation, in this example it is the " character String". You may think: "It doesn't seem to be a problem, anyway, these values have no meaning for the operation of the function." Wait, did you forget something? It means strict What to do with the pattern string "use strict"?
In versions that do not support strict mode, such as Firefox 3.6, this "use strict" is no different from other strings and will be deleted during compilation. .After SpiderMonkey implements strict mode, although the string "use strict" will also be ignored during compilation, it will be judged during decompilation. If this function is in strict mode, it will be in the function body at the beginning. Add "use strict" in one line, and the following is the corresponding engine source code.
static JSBool
DecompileBody(JSPrinter *jp, JSScript *script, jsbytecode *pc)
{
/* Print a strict mode code directive, if needed. */
if (script->strictModeCode && !jp->strict) {
if (jp->fun && (jp->fun->flags & JSFUN_EXPR_CLOSURE)) {
/*
* We have no syntax for strict function expressions;
* at least give a hint.
*/
js_printf(jp, "t/* use strict */ n");
} else {
js_printf(jp, "t"use strict";n");
}
jp->strict = true;
}
jsbytecode *end = script-> ;code script->length;
return DecompileCode(jp, script, pc, end - pc, 0);
}
2. Comments during compilation It will also be deleted This doesn’t seem to have much impact, but some people are willing to use function comments to implement multi-line strings. This method is not available in versions before Firefox 17.
function hereDoc(f) {
return f.toString( ).replace(/^. s/,"").replace(/. $/,"");
}
var string = hereDoc(function () {/*
i
You
He
*/});
console.log(string)
I
You
He
3. Original value Literal operations will be performed at compile time.
This is an optimization method. "High Performance JavaScript" mentioned:
Decompiled Disadvantages Due to the emergence of new technologies (such as strict mode) and when modifying other related bugs, the implementation of this part of the decompiler often needs to be changed. Changes may produce new bugs. I have personally experienced this I encountered a bug. It was probably around Firefox 10. I can’t remember the specific problem clearly. Anyway, it was about whether the parentheses should be retained during decompilation. It probably looked like this:
>(function (a,b,c){return (a b) c}).toString ()
"function (a, b, c) {
return a b c;
}"
When decompiling, the parentheses in (a b) are omitted , since the additive associativity goes from left to right, it doesn’t matter. But the bug I encountered is this:
>(function (a,b,c){return a (b c)}).toString()
"function (a, b, c) {
return a b c;
}"
This doesn’t work. a b c is not equal to a (b c). For example, in the case of a=1, b=2, c="3", a b c is equal to "33", and a (b c) is equal to "123 ".
Regarding decompilers, Mozilla engineer Luke Wagner pointed out that decompilers have greatly hindered them from implementing some new features, and often have some bugs:
Not to pile on , but I too have felt an immense drag from the decompiler in the last year. Testing coverage is also poor and any non-trivial change inevitably produces fuzz bugs. The sooner we remove this drag the sooner we start reaping the benefits. In particular, I think now is a much better time to remove it than after doing significant frontend/bytecode hacking for new language features.
Brendan Eich also said that the decompiler does have many shortcomings:
I have no love for the decompiler, it has been hacked over for 17 years. Storage function source code
After Firefox17, SpiderMonkey changed to the second implementation method. Other browsers should also implement it this way. Function sequence The resulting string is completely consistent with the source code, including whitespace characters, comments, etc. In this case, most of the problems should disappear. However, it seems that I have another question. It is about strict mode.
For example:
(function A() {
"use strict";
alert("A");
}) ""
Of course, the returned source code should also have "use strict", which is true for all browsers This is achieved:
function A() {
"use strict";
alert("A");
}
But what if this is the case:
(function A() {
"use strict";
return function B() {
alert( "B")
}
})() ""
Internal function B is also in strict mode. Should "use strict" be added to the function source code that outputs B? .Try it out:
As mentioned above, versions before Firefox17 and after Firefox4 decide whether to output "use strict" by judging whether the current function is in strict mode. Function B inherits the strict mode of function A. So there will be "use strict".
At the same time, the function source code is strictly indented, because during decompilation, SpiderMonkey will format the decompiled source code, even if the previous source code has no indentation at all It doesn't matter:
function B() {
" use strict";
alert("B");
}
Will versions after Firefox17 have "use strict"? Because the function source code is saved directly Yes, and there is indeed no "use strict" in function B. The test result is: "use strict" will be added, but there is a problem with the indentation, because there is no formatting step.
function B() {
"use strict";
alert(" B")
}
SpiderMonkey's latest version of jsfun.cpp source code has corresponding comments
// If an upper-level function of a function has "use strict ", then this function inherits the strict mode of the upper function.
// We will also insert "use strict" in the function body of this internal function.
// This ensures that if the toString of this function When the return value of the method is re-evaluated,
// the regenerated function will have the same semantics as the original function.
The difference is that other browsers do not have " use strict":
function B() {
alert("B")
}
Although this will not have much impact, I think Firefox's implementation is more reasonable.