In my previous blog post Don’t say it’s impossible, implement sleep in nodejs , I introduced you to the usage of nodejs addon. Today’s theme is still addon, continue to explore the capabilities of c/c and make up for the weaknesses of nodejs.
I have mentioned the performance issues of nodejs many times. In fact, as far as the language itself is concerned, the performance of nodejs is still very high. Although it is not as good as most static languages, the gap is not big; compared with other dynamic languages, the speed advantage is very obvious. But why do we often say that nodejs is not capable of CPU-intensive scenarios? Because due to its single-threaded nature, it cannot fully utilize the CPU for CPU-intensive scenarios. There is a famous Amdahl's law in computer science:
Assume that the total workload W can be decomposed into two parts: Ws which can only be calculated serially and Wp which allows parallel calculation. Then, in the case of parallel computing of p CPUs, the performance can be improved by speedup times. Amdahl's Law describes what parallelism can and cannot do. This is an ideal situation, the actual situation will be much more complex. For example, concurrency is likely to cause contention for resources, requiring the addition of various locks, which often leaves parallelism in a waiting state; concurrency will also cause additional time overhead for the operating system to switch thread scheduling, increasing Ws. However, when Wp is much larger than Ws in a task and multiple CPU cores are available, the performance improvement brought by parallelism is considerable.
Okay, back to nodejs. Let's imagine a calculation scenario: calculate the number of prime numbers within 4,000,000. When this scenario is programmed, division operations are mainly used, and operations such as memory and objects are not involved. In theory, it can ensure that nodejs runs at a relatively fast speed and will not lag too far behind c, which is convenient for comparison.
The method of finding prime numbers in JavaScript has been provided in this blog, copy it directly:
Write another c language version:
bool zhishu(int num){
If (num == 1) {
return false;
}
If (num == 2) {
return true;
}
for (int i = 2; i <= sqrt(num); i ) {
If (num % i == 0) {
return false;
}
}
Return true;
};
In nodejs, we use a loop from 1 to 4000000 to retrieve prime numbers; in c language, we set up several threads and define count as 4000000. Each thread does the following: if count is greater than 0, take it out The value of count, and calculates whether it is a prime number, and decrements count by 1. According to this idea, the javascript version is easy to write:
for (j = 1; j < 4000000; j ) {
If(zhishu(j)){
Count ;
}
}
The key difficulty is multi-threaded programming in C language. In the early days of c/c, the need for parallel computing was not considered, so multi-threading support was not provided in the standard library. Different operating systems usually have different implementations. In order to avoid this trouble, we use pthread to handle threads.
Download the latest version of pthread. Since I am not familiar with gyp, it took me a long time to fix the link dependency lib. In the end, my method was to directly put the source code of pthread in the project directory, and add pthread.c to the source code list in binding.gyp. , compile pthread once when compiling the project. The modified binding.gyp looks like this:
Of course, my method is very troublesome. If you only add references to the lib and include directories in pthread, and there are no dependency problems, that is the best. There is no need to use my method.
Then let’s get into everything about C/C multi-threading and define a thread processing function:
void *thread_p(void *null){
int num, x=0;
do{
pthread_mutex_lock(&lock);
num=count--;
pthread_mutex_unlock(&lock);
If(num>0){
If(zhishu(num))x ;
}else{
break;
}
}while(true);
std::cout<<' '<
return null;
}
Between threads, the count variable competes with each other. We need to ensure that only one thread can operate the count variable at the same time. We add a mutex lock via pthread_mutex_t lock;. When pthread_mutex_lock(&lock); is executed, the thread checks the lock status. If it is locked, it waits and checks again, blocking subsequent code execution; if the lock is released, it locks and executes subsequent code. Correspondingly, pthread_mutex_unlock(&lock); is to unlock the lock state.
Since the compiler performs compilation optimization while compiling, if a statement does not clearly do anything and has no impact on the execution of other statements, it will be optimized away by the compiler. In the above code, I added the code to count the number of prime numbers. If not, it would look like this:
will be directly skipped by the compiler and will not actually run.
The writing method of adding addon has been introduced. We receive a parameter from javascript, indicating the number of threads, and then create a specified number of threads in c to complete the prime number retrieval. Complete code:
int count=4000000;
pthread_t tid[MAX_THREAD];
pthread_mutex_t lock;
void *thread_p(void *null){
int num, x=0;
do{
pthread_mutex_lock(&lock);
num=count--;
pthread_mutex_unlock(&lock);
if(num>0){
if(zhishu(num))x ;
}else{
break;
}
}while(true);
std::cout<<' '<
return null;
}
NAN_METHOD(Zhishu){
NanScope();
pthread_mutex_init(&lock,NULL);
double arg0=args[0]->NumberValue();
int c=0;
for (int j = 0; j < arg0 && j
}
for (int j = 0; j < arg0 && j
}
NanReturnUndefined();
}
void Init(Handle
NODE_MODULE(hello, Init);
phread_create可以创建线程,默认是joinable的,这个时候子线程受制于主线程;phread_join阻塞住主线程,等待子线程join,直到子线程退出。如果子线程已退出,则phread_join不会做任何事。所以对所有的线程都执行thread_join,可以保证所有的线程退出后才会例主线程继续进行。
完善一下nodejs脚本:
console.time("c");
zhishu_c(100);
console.timeEnd("c");
console.time("js");
var count=0;
for (j = 1; j < 4000000; j ) {
if(zhishu(j)){
count ;
}
}
console.log(count);
console.timeEnd("js");
Take a look at the test results:
In single thread, although the running speed of C/C is 181% of nodejs, we think this result is still very good in dynamic languages. The speed improvement is most obvious when using dual threads. That is because my computer has a dual-core four-thread CPU, and it is possible that two cores are being used for processing at this time. The speed reaches the maximum when there are 4 threads. At this time, it should be the limit that dual-core and four threads can reach. When the number of threads is increased, the speed cannot be improved. In the above Amdahl's law, p has reached the upper limit of 4. Adding more threads will increase the operating system process scheduling time and lock time. Although it will also increase competition for CPU time, overall, the increase in Ws will be more obvious and the performance will decrease. If you do this experiment on an idle machine, the data should be better.
From this experiment, we can draw the conclusion that for CPU-intensive operations, the efficiency will be improved a lot if it is left to static languages. If the calculations involve more memory, strings, arrays, recursion, etc. Operation (to be verified later), the performance improvement is even more amazing. At the same time, rational use of multi-threads can effectively improve processing efficiency, but more threads are not always better. They must be configured appropriately according to the machine's conditions.
Nodejs itself is indeed not good at handling CPU-intensive tasks, but with the experience of this article, I think it is not impossible to overcome this obstacle.