Detailed introduction to Javascript obfuscation and deobfuscation (with code)-JS Tutorial-php.cn

This article brings you a detailed introduction to Javascript obfuscation and deobfuscation (with code). It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Like software encryption and decryption, JavaScript obfuscation and deobfuscation belong to the same category. Mastering the stick, but aiming for the yard. There is no eternal black and no eternal white. Everything is driven by the capital market, and the concept of what problems you can solve for people is now popular. So how many stakeholders can the market accommodate that can solve this problem? There are no secrets to JS.

In fact, I don’t agree with JavaScript’s hash obfuscation. First, it slows down the runtime speed, and secondly, it is bulky. The JS code front-end is available and is inherently "open source", and can be viewed under chrome devTools. JS non-compressible obfuscation completely violates front-end optimization principles.

The JS obfuscation tools currently searchable on the Internet are nothing more than the following:

eval obfuscation is also the earliest obfuscated encryption that appeared in JS. It is said that it was cracked on the first day. Please modify the code. , it can be cracked with one alert. This method loses its meaning from the day of birth. In fact, JS encryption (obfuscation) is relative to readability. In fact, what is really meaningful is the type of compressed obfuscation uglify, which can reduce weight and readability.

However, it cannot be ruled out that some commercial source codes use hash type to obfuscate the source code, such as JSA encryption used by miniui and javascript-obfuscator used by fundebug.

The following code illustrates the difference between JSA encryption and javascript-obfuscator:

Code to be obfuscated:

function logG(message) {
  console.log('\x1b[32m%s\x1b[0m', message); 
}
function logR(message) {
  console.log('\x1b[41m%s\x1b[0m', message); 
}
logG('logR');
logR('logG');

Copy after login

Code generated by JSA encryption and obfuscation

function o00($){console.log("\x1b[32m%s\x1b[0m",$)}function o01($){console.log("\x1b[41m%s\x1b[0m",$)}o00("logR");o01("logG")

Copy after login

Then beautifier it again:

function o00($) {
  console.log("\x1b[32m%s\x1b[0m", $)
}

function o01($) {
  console.log("\x1b[41m%s\x1b[0m", $)
}
o00("logR");
o01("logG")

Copy after login

It can be found that in fact no modifications were made, only some variable substitutions were made. It's relatively easy to restore. I won’t use it as a representative here, and no one uses it.

Code generated after obfuscation by javascript-obfuscator

var _0xd6ac=['[41m%s[0m','logG','log'];(function(_0x203a66,_0x6dd4f4){var _0x3c5c81=function(_0x4f427c){while(--_0x4f427c){_0x203a66['push'](_0x203a66['shift']());}};_0x3c5c81(++_0x6dd4f4);}(_0xd6ac,0x6e));var _0x5b26=function(_0x2d8f05,_0x4b81bb){_0x2d8f05=_0x2d8f05-0x0;var _0x4d74cb=_0xd6ac[_0x2d8f05];return _0x4d74cb;};function logG(_0x4f1daa){console[_0x5b26('0x0')]('[32m%s[0m',_0x4f1daa);}function logR(_0x38b325){console[_0x5b26('0x0')](_0x5b26('0x1'),_0x38b325);}logG('logR');logR(_0x5b26('0x2'));

Copy after login

Beautifier again:

var _0xd6ac = ['[41m%s[0m', 'logG', 'log'];
(function(_0x203a66, _0x6dd4f4) {
  var _0x3c5c81 = function(_0x4f427c) {
    while (--_0x4f427c) {
      _0x203a66['push'](_0x203a66['shift']());
    }
  };
  _0x3c5c81(++_0x6dd4f4);
}(_0xd6ac, 0x6e));
var _0x5b26 = function(_0x2d8f05, _0x4b81bb) {
  _0x2d8f05 = _0x2d8f05 - 0x0;
  var _0x4d74cb = _0xd6ac[_0x2d8f05];
  return _0x4d74cb;
};

function logG(_0x4f1daa) {
  console[_0x5b26('0x0')]('[32m%s[0m', _0x4f1daa);
}

function logR(_0x38b325) {
  console[_0x5b26('0x0')](_0x5b26('0x1'), _0x38b325);
}
logG('logR');
logR(_0x5b26('0x2'));

Copy after login

This is much more complicated, but after analysis you will find that there is actually an extra dictionary , all method variables may exist in the dictionary. When calling, first call the dictionary to restore the method name variable and then execute.
In fact, the entrances are all rules of variables.

Dictionary function:

var _0xd6ac = ['[41m%s[0m', 'logG', 'log'];
(function(_0x203a66, _0x6dd4f4) {
  var _0x3c5c81 = function(_0x4f427c) {
    while (--_0x4f427c) {
      _0x203a66['push'](_0x203a66['shift']());
    }
  };
  _0x3c5c81(++_0x6dd4f4);
}(_0xd6ac, 0x6e));
var _0x5b26 = function(_0x2d8f05, _0x4b81bb) {
  _0x2d8f05 = _0x2d8f05 - 0x0;
  var _0x4d74cb = _0xd6ac[_0x2d8f05];
  return _0x4d74cb;
};

Copy after login

Through the above findings, we can classify JS confusion into three categories, namely eval type, hash type, and compression type. The compression type is currently a commonly used tool for front-end performance optimization, represented by uglify.

Commonly used front-end compression optimization tools:

JavaScript:

babel-minify
terser
uglify-js
uglify-es
Google Closure Compiler
YUI Compressor

CSS:

PostCSS
clean-css
CSSO
YUI Compressor

HTML:

html-minifier

From the perspective of tool flow (workflow), whether it is webpack or gulp, the most popular tool for JavaScript at present is uglify.

The corresponding deobfuscation tool:

The deobfuscation tool corresponding to eval can be searched on Baidu, such as jspacker
The deobfuscation tool corresponding to JSA unjsa
The deobfuscation tool crack.js corresponding to javascript-obfuscator
The tool UnuglifyJS corresponding to the compression type uglify, the online version of jsnice

The deobfuscation strategy is actually generated based on Regular writing of code is nothing more than observing feature analysis, then observing feature analysis, and making constant adjustments. They are all figures that can be seen with ease.

There is no difficulty at all, all you need is patience. For example, the deobfuscation tool corresponding to javascript-obfuscator can
be decomposed into an N-factor problem:

How to query the scope of a function?
Possible types for pre-execution variable substitution?
...

For example:

var _0xd6ac = ['[41m%s[0m', 'logG', 'log'];
(function(_0x203a66, _0x6dd4f4) {
  var _0x3c5c81 = function(_0x4f427c) {
    while (--_0x4f427c) {
      _0x203a66['push'](_0x203a66['shift']());
    }
  };
  _0x3c5c81(++_0x6dd4f4);
}(_0xd6ac, 0x6e));
var _0x5b26 = function(_0x2d8f05, _0x4b81bb) {
  _0x2d8f05 = _0x2d8f05 - 0x0;
  var _0x4d74cb = _0xd6ac[_0x2d8f05];
  return _0x4d74cb;
};

function logG(_0x4f1daa) {
  console[_0x5b26('0x0')]('[32m%s[0m', _0x4f1daa);
}

function logR(_0x38b325) {
  console[_0x5b26('0x0')](_0x5b26('0x1'), _0x38b325);
}
logG('logR');
logR(_0x5b26('0x2'));

Copy after login

To restore to

function logG(message) {
  console.log('\x1b[32m%s\x1b[0m', message); 
}
function logR(message) {
  console.log('\x1b[41m%s\x1b[0m', message); 
}
logG('logR');
logR('logG');

Copy after login

the first step is to know the dictionary function, and then execute the dictionary function _0x5b26(' 0x0') is restored to log.

Then it will be easier to write code.
For example, https://github.com/jscck/crac...

How to reconstruct the code after restoration, then you also need to know what tool webpack was used to package the code before it was generated? Or?

Such as the various encapsulation headers and tails of webpack
https://webpack.js.org/config...

(function webpackUniversalModuleDefinition(root, factory) {
  if(typeof exports === 'object' && typeof module === 'object')
    module.exports = factory();
  else if(typeof define === 'function' && define.amd)
    define([], factory);
  else if(typeof exports === 'object')
    exports['MyLibrary'] = factory();
  else
    root['MyLibrary'] = factory();
})(typeof self !== 'undefined' ? self : this, function() {
  return _entry_return_;
});

Copy after login

If you go deeper, it may involve JS syntax Interpreter, AST abstract syntax tree

Currently involving the JS syntax interpreter, the functions of the AST abstract syntax tree are as follows:

prepack, esprima, babel

Or you can read " Programming Language Implementation Patterns", involving antlr4.

Of course, you can also use tools such as esprima to deobfuscate, but the workload is a bit larger and it’s not worth it.

For the future, the direction of JS commercial source code encryption may be webassembly. First compile it into wasm on the server side, and the source code can be truly closed source.

Where there are people, there is a road, and where there is confusion, there is deobfuscation. The current deobfuscation tools for machine learning programming response are also doing quite well, such as

Machine Learning for Programming products
nice2predict,jsnice...
View https://www.sri.inf.ethz.ch/r...

Extended Reference

AST Abstract Syntax Tree

Why talk about AST Abstract Syntax Tree additionally, because you can input-> ast -> output Anything.

For example, if you convert jsx to mini program template syntax, you can use react syntax to write mini programs, such as Taro.
mpvue, wepy, postcss... These are all tools for building and converting through AST. es6 -> es5, babel all use AST.

The general process of AST abstract syntax tree:

Input generates AST tree

Then perform corresponding conversion through AST type assertion

[Related recommendations:JavaScriptVideoTutorial】

The above is the detailed content of Detailed introduction to Javascript obfuscation and deobfuscation (with code). For more information, please follow other related articles on the PHP Chinese website!