Excuse me, what kind of business is computing-intensive and what kind of business is IO-intensive? Why is it said that PHP was originally designed for computer-intensive use and node.js is for IO-intensive use?
I read a chapter in Liao Xuefeng’s blog yesterday about: Computing Intensive vs. IO Intensive, the excerpt is as follows:
We can divide tasks into computing-intensive and IO-intensive.
Computing-intensivetasks are characterized by requiring a large amount of calculations and consuming CPU resources, such as calculating pi, high-definition decoding of videos, etc., all relying on the computing power of the CPU. Although this kind of computing-intensive task can also be completed with multi-tasking, the more tasks there are, the more time spent on task switching, and the lower the efficiency of the CPU in executing tasks. Therefore, to make the most efficient use of the CPU, computing-intensive tasks The number of simultaneous tasks should be equal to the number of CPU cores.
Computing-intensive tasks mainly consume CPU resources, so code running efficiency is crucial. Scripting languages like Python run very inefficiently and are completely unsuitable for computationally intensive tasks. For computationally intensive tasks, it is better to write in C language.
IO-intensive, tasks involving network and disk IO are all IO-intensive tasks. The characteristic of this type of tasks is that the CPU consumption is very small, and most of the task is waiting for the IO operation to be completed (because the IO speed is far much lower than the speed of CPU and memory). For IO-intensive tasks, the more tasks, the higher the CPU efficiency, but there is a limit. Most common tasks are IO-intensive tasks, such as web applications.
During the execution of IO-intensive tasks, 99% of the time is spent on IO, and very little time is spent on the CPU. Therefore, replacing extremely slow scripting languages like Python with extremely fast C language is completely Operation efficiency cannot be improved. For IO-intensive tasks, the most suitable language is the language with the highest development efficiency (the least amount of code). Scripting language is the first choice, and C language is the worst.
There is a well-used chestnut that explains the concept of Node.js language 非阻塞,事件驱动, which is roughly like this:
When you go to a restaurant to eat, the waiter in the restaurant will hand the menu directly to the kitchen after taking your order, and then immediately go to the next customer instead of waiting in the kitchen (阻塞)。直到烹饪结束后,厨师大喊“来拿菜”(事件)。服务员跑回厨房,把菜品端到你的桌子上(事件处理/回调)
In this chestnut, we can simply understand it as: the waiter is equivalent to CPU,而厨房的工作就是I/O. Obviously, in a restaurant, the waiter's job is not complicated. Most of the time "waiting for food" is spent "waiting for the kitchen." This is similar to the situation of most websites today: websites usually do not need to do too many complex operations, but they need to spend a lot of time waiting for I/O processing, such as database queries, such as reading pictures and videos.
So Node.js abandoned the practice of "every time a request comes, open a thread for separate processing" in traditional web services, and instead adopted an event-driven model. By default, only a single thread can bear a high burden amount of concurrency.
So in this situation, we say: Node.js 更适合 IO密集型的任务处理.
But if we change the chestnut above, for example, we don’t open a restaurant, but a bank. Whenever a customer comes and asks to withdraw a specified amount of money, as a waiter, you need to calculate the interest rate, calculate interest, calculate dividends, etc. according to the customer's account level... (a random metaphor may not be appropriate), and "Withdraw money" And hand it over to the customer" This action itself is not complicated.
At this time, we cannot expect to rely on just one waiter to handle a large number of customers like a restaurant, because each request requires a large amount of the waiter's time (inevitable阻塞). At this time, traditional models such as PHP may become more suitable.
IO-intensive: Applications with a lot of IO, such as network transmission, database calls, etc. Most web applications are like this
Computation-intensive: As the name suggests, it is the type of application that requires a lot of CPU calculations. Applications such as cloud computing should fall into this category.
What is computationally intensive? For example, put the SQLite database on the Linux memory file system /dev/shm to perform a SELECT query operation on 1 million data. Then this SELECT query, when using the B+ tree index, will be executed in the B+ tree index. The binary search on is a typical intensive calculation. If no index is used, simply scanning the entire table is also intensive calculation. Therefore, things like relational databases are generally implemented using C/C++ to ensure performance and control memory usage. .
What is IO intensive? For example, the SQLite database is not in memory, but on an ordinary mechanical disk, write operations (INSERT/UPDATE/DELETE) are typical IO intensive operations, because no matter how fast the CPU is at this time, the SQLite engine is faster. , will also be slowed down by the write operation of the mechanical disk. Therefore, for the sake of concurrency, SQLite later introduced WAL(write-ahead log)write-ahead log support. The specific configuration is to execute the SQLite query:
The principle of the WAL mechanism is: modifications are not written directly to the database file, but to another file called WAL (data.db3-wal). If the transaction fails, the records in WAL will be ignored , Undo the modification. If the transaction is successful, it will be written back to the database file at a later time (PRAGMA synchronous = NORMAL) and commit the modification. The behavior of synchronizing the WAL file and the database file is called checkpoint. It is automatically executed by SQLite. The default is when the WAL file accumulates 1000 pages of modifications (PRAGMA wal_autocheckpoint). At the appropriate time, you can also manually execute the checkpoint. SQLite provides relevant interfaces. After executing PRAGMA wal_checkpoint, the WAL file will be Empty. When reading, SQLite will search in the WAL file to find the last writing point, remember it, and ignore writing points after this (this ensures that reading and writing can be performed in parallel). Subsequently, It determines whether the page of the data to be read is in the WAL file. If it is, read the data in the WAL file. If not, read the data in the database file directly. When writing, SQLite writes it to the WAL file. can be used, but exclusive writing must be guaranteed, so writing and writing cannot be executed in parallel. During the implementation of WAL, shared memory technology (data.db3-shm) is used, so all reading and writing processes must be in On the same machine, otherwise, data consistency cannot be guaranteed.
Concepts like WAL and checkpoint also exist in other databases such as MySQL, but MySQL is more complex and can support larger-scale concurrent write operations. With this writing method, you can think of it as a An asynchronous write.WAL+checkpoint
Node’s JS interpreter is based on Chromium’s V8, and V8 has a JIT just-in-time compilation mechanism, so Node’s intensive computing performance is worse than PHP. Although PHP officials are now developing PHP’s JIT test, its performance Still not as good as V8. However, even if Node has good computing performance, almost no one would recommend performing large-scale intensive calculations in Node, because intensive computing consumption will inevitably block Node services, which is contrary to the non-blocking concept advocated by Node.
Node uses the event-driven characteristics of JS to achieve non-blocking, but the callback-based event programming method is not conducive to code maintenance. Moreover, services like Node cannot use single process, multi-threading and multi-core like Java services such as Tomcat, and V8 does not It is not designed for multi-threading, so Node officials can only build a cluster multi-process module to take advantage of multi-core.
PHP is relatively mediocre, and its calculation speed is not as fast as that of JIT languages. However, among non-JIT general script languages, PHP is not slow. For example, PHP5 is already faster than Python, and PHP7 is much faster than Python, such as php-src In the /Zend/bench.php test, the time consumption of PHP 7.1 is only 1/4 of 5.4.
In addition, multi-process FastCGI running methods such as PHP's PHP-FPM and Apache MOD_PHP can easily utilize multiple cores. Moreover, opcache can also be enabled to cache the opcode of PHP scripts into shared memory. In other words, assuming you define many functions in In functions.php, after opcache caches the script in memory, there is no need to re-parse the script every time PHP is requested, and the opcode can be executed directly. The performance improvement is very obvious, especially for complex PHP applications.
I read a chapter in Liao Xuefeng’s blog yesterday about: Computing Intensive vs. IO Intensive, the excerpt is as follows:
There is a well-used chestnut that explains the concept of Node.js language
非阻塞
,事件驱动
, which is roughly like this:In this chestnut, we can simply understand it as: the waiter is equivalent to
CPU
,而厨房的工作就是I/O
. Obviously, in a restaurant, the waiter's job is not complicated. Most of the time "waiting for food" is spent "waiting for the kitchen." This is similar to the situation of most websites today: websites usually do not need to do too many complex operations, but they need to spend a lot of time waiting for I/O processing, such as database queries, such as reading pictures and videos.So Node.js abandoned the practice of "every time a request comes, open a thread for separate processing" in traditional web services, and instead adopted an event-driven model. By default, only a single thread can bear a high burden amount of concurrency.
So in this situation, we say:
Node.js 更适合 IO密集型的任务处理
.But if we change the chestnut above, for example, we don’t open a restaurant, but a bank. Whenever a customer comes and asks to withdraw a specified amount of money, as a waiter, you need to calculate the interest rate, calculate interest, calculate dividends, etc. according to the customer's account level... (a random metaphor may not be appropriate), and "Withdraw money" And hand it over to the customer" This action itself is not complicated.
At this time, we cannot expect to rely on just one waiter to handle a large number of customers like a restaurant, because each request requires a large amount of the waiter's time (inevitable
阻塞
). At this time, traditional models such as PHP may become more suitable.The above, I hope it can clear up your confusion
IO-intensive: Applications with a lot of IO, such as network transmission, database calls, etc. Most web applications are like this
Computation-intensive: As the name suggests, it is the type of application that requires a lot of CPU calculations. Applications such as cloud computing should fall into this category.
What is computationally intensive? For example, put the SQLite database on the Linux memory file system /dev/shm to perform a SELECT query operation on 1 million data. Then this SELECT query, when using the B+ tree index, will be executed in the B+ tree index. The binary search on is a typical intensive calculation. If no index is used, simply scanning the entire table is also intensive calculation. Therefore, things like relational databases are generally implemented using C/C++ to ensure performance and control memory usage. .
What is IO intensive? For example, the SQLite database is not in memory, but on an ordinary mechanical disk, write operations (INSERT/UPDATE/DELETE) are typical IO intensive operations, because no matter how fast the CPU is at this time, the SQLite engine is faster. , will also be slowed down by the write operation of the mechanical disk. Therefore, for the sake of concurrency, SQLite later introduced
WAL(write-ahead log)
write-ahead log support. The specific configuration is to execute the SQLite query:The principle of the WAL mechanism is: modifications are not written directly to the database file, but to another file called WAL (data.db3-wal). If the transaction fails, the records in WAL will be ignored , Undo the modification. If the transaction is successful, it will be written back to the database file at a later time (PRAGMA synchronous = NORMAL) and commit the modification. The behavior of synchronizing the WAL file and the database file is called checkpoint. It is automatically executed by SQLite. The default is when the WAL file accumulates 1000 pages of modifications (PRAGMA wal_autocheckpoint). At the appropriate time, you can also manually execute the checkpoint. SQLite provides relevant interfaces. After executing PRAGMA wal_checkpoint, the WAL file will be Empty. When reading, SQLite will search in the WAL file to find the last writing point, remember it, and ignore writing points after this (this ensures that reading and writing can be performed in parallel). Subsequently, It determines whether the page of the data to be read is in the WAL file. If it is, read the data in the WAL file. If not, read the data in the database file directly. When writing, SQLite writes it to the WAL file. can be used, but exclusive writing must be guaranteed, so writing and writing cannot be executed in parallel. During the implementation of WAL, shared memory technology (data.db3-shm) is used, so all reading and writing processes must be in On the same machine, otherwise, data consistency cannot be guaranteed.
Concepts like WAL and checkpoint also exist in other databases such as MySQL, but MySQL is more complex and can support larger-scale concurrent write operations. With this writing method, you can think of it as a An asynchronous write.
Node’s JS interpreter is based on Chromium’s V8, and V8 has a JIT just-in-time compilation mechanism, so Node’s intensive computing performance is worse than PHP. Although PHP officials are now developing PHP’s JIT test, its performance Still not as good as V8. However, even if Node has good computing performance, almost no one would recommend performing large-scale intensive calculations in Node, because intensive computing consumption will inevitably block Node services, which is contrary to the non-blocking concept advocated by Node.WAL+checkpoint
Node uses the event-driven characteristics of JS to achieve non-blocking, but the callback-based event programming method is not conducive to code maintenance. Moreover, services like Node cannot use single process, multi-threading and multi-core like Java services such as Tomcat, and V8 does not It is not designed for multi-threading, so Node officials can only build a cluster multi-process module to take advantage of multi-core.
PHP is relatively mediocre, and its calculation speed is not as fast as that of JIT languages. However, among non-JIT general script languages, PHP is not slow. For example, PHP5 is already faster than Python, and PHP7 is much faster than Python, such as php-src In the /Zend/bench.php test, the time consumption of PHP 7.1 is only 1/4 of 5.4.
In addition, multi-process FastCGI running methods such as PHP's PHP-FPM and Apache MOD_PHP can easily utilize multiple cores. Moreover, opcache can also be enabled to cache the opcode of PHP scripts into shared memory. In other words, assuming you define many functions in In functions.php, after opcache caches the script in memory, there is no need to re-parse the script every time PHP is requested, and the opcode can be executed directly. The performance improvement is very obvious, especially for complex PHP applications.