Teach you a detailed explanation of how to use C programs in JavaScript-JS Tutorial-php.cn

Teach you a detailed explanation of how to use C programs in JavaScript:

JavaScript is a flexible scripting language that can easily handle business logic. When communication needs to be transmitted, we mostly choose JSON or XML formats.

But when the data length is very demanding, the efficiency of the text protocol is very low, and the binary format has to be used.

On this day last year, I encountered this trouble when I was tossing around a WAF that combined front-end and back-end.

Because the front-end script needs to collect a lot of data, which is ultimately hidden in a cookie, the available length is very limited, only a few dozen bytes.

If you use JSON without thinking, just one tag field {"enableXX": true} will take up half the length. However, in binary, marking true or false is only a matter of 1 bit, which can save hundreds of times of space.

At the same time, the data also needs to go through verification, encryption, etc. Only by using binary format can these algorithms be easily called.

Elegant implementation

However, JavaScript does not support binary.

The "not supported" here does not mean "unable to be implemented", but that it cannot be "elegantly implemented". Language was invented to solve problems elegantly. Even without language, humans can write programs using machine instructions.

If you have to use JavaScript to operate binary, it will end up like this:

var flags = +enableXX1 << 16 | +enableXX2 << 15 | ...

Copy after login

Although it can be implemented, it is very ugly. Various hard coding, various bit operations.

However, for languages that inherently support binary, it looks very elegant:

union {
    struct {
        int enableXX1: 1;
        int enableXX2: 1;
        ...
    };
    int16_t value;
} flags;

flags.enableXX1 = enableXX1;
flags.enableXX2 = enableXX2;

Copy after login

Developers only need to define a description. When using it, you don’t need to worry about the details of how much the field is offset and how to read and write it.

In order to achieve a similar effect, we first encapsulated a JS version of the structure:

// 最初方案：封装一个 JS 结构体
var s = new Struct([
    {name: &#39;month&#39;, bit: 4, signed: false},
    ...
]);

s.set(&#39;month&#39;, 12);
s.get(&#39;month&#39;);

Copy after login

The details are hidden and it looks much more elegant.

Elegant but not perfect

However, this always feels like it is not the most perfect. Things like structures should be provided by the language, but now they have to be implemented with additional code, and it is still during runtime.

In addition, back-end decoding is implemented in C, so two sets of codes have to be maintained. Once the data structure or algorithm changes, it is very troublesome to update JS and C at the same time.

So I was thinking, can I share a set of C code for both front-end and back-end?

In other words, you need to be able to compile C into JS to run.

Get to know emscripten

There are many tools that can compile C into JS, and the most professional one is emscripten.

The use of emscripten is very simple, similar to the traditional C compiler, except that it generates JS code.

./emcc hello.c -o hello.html

// hello.c
#include <stdio.h>
#include <time.h>  

int main() {
    time_t now;
    time(&now);
    printf("Hello World: %s", ctime(&now));
    return 0;
}

Copy after login

You can run it after compilation:

It’s very interesting~ You can try it, I won’t introduce it here.

Practical flaws

However, what we care about is not fun, but practicality.

In fact, even a Hello World compiled JS is over 10,000 lines, up to hundreds of KB. Even if compressed with GZIP, there are still dozens of KB.

At the same time, emscripten uses the asm.js specification, and memory access is implemented through TypedArray.

This means that users below IE10 cannot run it. This is also unacceptable.

Therefore, we have to make the following improvements:

Reduce size
Increase compatibility

First of all, let's rely on emscripten itself to see if we can achieve our goal by setting parameters.

But after some attempts, it was not successful. That can only be achieved by yourself.

Reduce size

Why is the final script so big and what is in it? After analyzing the content, there are roughly these parts:

Auxiliary functions
Interface simulation
Initialization operation
Runtime function
Program logic

Auxiliary function

Such as string and binary conversion, providing callback packaging, etc. These are basically unnecessary, we can write ourselves a special callback function.

Interface simulation

Provides file, terminal, network, rendering and other interfaces. I have seen client games ported using emscripten before, and it seems that many interfaces are simulated.

Initialization operation

Initialization of global memory, runtime, and various modules.

Runtime function

Pure C can only do simple calculations, and many functions rely on runtime functions.

However, the implementation behind some commonly used functions is extremely complicated. For example, malloc and free, the corresponding JS has nearly 2000 lines!

Program logic

This is the JS code that the C program really corresponds to. Because LLVM optimizes it during compilation, the logic may become unrecognizable.

This part of the code is not large, it is what we really want.

In fact, if the program does not use some special functions, it can still be run if the logical function is extracted separately!

Considering that our C program is very simple, it is no problem to extract it simply and crudely.

C 程序对应的 JS 逻辑位于 // EMSCRIPTEN_START_FUNCS 和 // EMSCRIPTEN_END_FUNCS 之间。过滤掉运行时函数，剩下的就是 100% 的逻辑代码了。

增加兼容

接着解决内存访问的兼容性问题。

首先了解下，为何要用 TypedArray。

emscripten 申请了一大块 ArrayBuffer 来模拟内存，然后关联了一些 HEAP 开头的变量。

这些不同类型的 HEAP 共享同一块内存，这样就能高效的指针操作。

然而不支持 TypedArray 的浏览器，显然无法运行。所以得提供个 polyfill 兼容下。

但经分析，这几乎不可能实现 —— 因为 TypedArray 和数组一样，是通过索引来访问的：

var buf = new Uint8Array(100);
buf[0] = 123;     // set
alert(buf[0]);    // get

Copy after login

然而 [] 操作符在 JS 里是无法重写的，因此难以将其变成 setter 和 getter。况且不支持 TypedArray 的都是低版本 IE，更不用考虑 ES6 的那些特征。

于是琢磨 IE 的私有接口。比如用 onpropertychange 事件来模拟 setter。不过这样做效率极低，而且 getter 仍不易实现。

经过一番考虑，决定不用钩子的方式，而是直接从源头上解决 —— 修改语法！

我们用正则，找出源码中的赋值操作:

HEAP[index] = val;

Copy after login

替换成:

HEAP_SET(index, val);

Copy after login

类似的，将读取操作:

HEAP[index]

Copy after login

替换成:

HEAP_GET(index)

Copy after login

这样，原先的索引操作，就变成函数调用了。我们就能接管内存的读写，并且没有任何兼容性问题！

然后实现 8、16、32 位有无符号的版本。通过 JS 的 Array 来模拟，非常简单。

麻烦的是模拟 Float32 和 Float64 两个类型。不过本次 C 程序中并未用到浮点，所以就暂不实现了。

到此，兼容性问题就解决了。

大功告成

解决了这些缺陷，我们就可以愉快的在 JS 中使用 C 逻辑了。

作为脚本，只需关心采集哪些数据。这样 JS 代码就非常的优雅：

数据的储存、加密、编码，这些底层数据操作，则通过 C 实现。

编译时使用 -Os 参数优化体积。最终的 JS 混淆压缩之后，还不到 2 KB，十分小巧精炼。

更完美的是，我们只需维护一份代码，即可同时编译出前端和后端两个版本。

于是，这个「前后端 WAF」开发就容易多了。

所有的数据结构和算法，都由 C 实现。前端编译成 JS 代码，后端编译成 lua 模块，供 nginx-lua 使用。

前后端的脚本，都只需关注业务功能即可，完全不用涉及数据层面的细节。

测试版

事实上，还有第三个版本 —— 本地版。

因为所有的 C 代码都在一起，因此可以方便的编写测试程序。

这样就无需启动 WebServer、打开浏览器来测试了。只需模拟一些数据，直接运行程序即可测试，非常轻量。

同时借助 IDE，调试起来更容易。

小结

每一门语言都有各自的优缺点。将不同语言的优势相互结合，可以让程序变得更优雅、更完美。

The above is the detailed content of Teach you a detailed explanation of how to use C programs in JavaScript. For more information, please follow other related articles on the PHP Chinese website!