


Load balancing difficulties in multi-core programming_PHP tutorial
In multi-core CPUs, if you want to fully utilize the performance of multiple CPUs, you must ensure that the tasks assigned to each CPU have a good load balance. Otherwise, some CPUs are running and other CPUs are idle, and the advantages of multi-core CPUs cannot be used.
There are usually two solutions to achieve a good load balancing, one is static load balancing and the other is dynamic load balancing.
1. Static load balancing
In static load balancing, you need to manually divide the program into multiple parts that can be executed in parallel, and you must ensure that the divided parts are It can be evenly distributed to run on each CPU, which means that the workload must be evenly distributed among multiple tasks to achieve a high acceleration factor.
Mathematically speaking, the static load balancing problem is an NP-complete problem. Richard M. Karp, Jeffrey D. Ullman, Christos H. Papadimitriou, M. Garey, D. Johnson and others have successively worked on The NP-completeness of the static load problem under several different constraints was demonstrated between 1972 and 1983.
Although the NP-completeness problem is a difficult problem in mathematics, it is not the difficult problem mentioned in the title, because NP-completeness problems can generally be solved by very effective approximation algorithms.
2. Dynamic load balancing
Dynamic load balancing is to allocate tasks during the running process of the program to achieve the purpose of load balancing. In actual situations, there are many problems that cannot be solved by static load balancing. For example, in a large loop, the number of loops is input from the outside, and the number of loops is not known in advance. In this case, it is difficult to implement the static load balancing division strategy. Load balancing.
The scheduling of tasks in dynamic load balancing is generally implemented by the system. Programmers can usually only choose the dynamic balancing scheduling strategy and cannot modify the scheduling strategy, because there are many inconsistencies in actual tasks. Due to certain factors, the scheduling algorithm cannot do a very good job, so dynamic load balancing may sometimes not meet the established load balancing requirements.
3. What is the problem with load balancing?
The problem of load balancing does not lie in the degree of load balancing, because even if there are some gaps in the task execution time allocated on each CPU, as the number of CPU cores increases, it can always be achieved The total execution time decreases, so that the acceleration factor increases with the increase in the number of CPU cores.
The difficulty of load balancing is that many parallel execution blocks in the program must be divided by programmers. Of course, when the number of CPU cores is small, such as dual-core or 4-core, this division is not Very difficult. But as the number of cores increases, the granularity of division will become increasingly finer. When the number of cores exceeds 16, programmers will probably go crazy over how to divide tasks. For example, if a piece of sequentially executed code is run on a 128-core CPU, it must be manually divided into 128 tasks. The difficulty of the division can be imagined.
The error in load division will amplify as the number of CPU cores increases. For example, a program that takes 16 time units is divided into 4 tasks for execution, and the average load execution time on each task is is 4 time units and the division error is 1 time unit, then the acceleration coefficient becomes 16/(4 1)=3.2, which is 80% of the acceleration coefficient 4 under ideal circumstances. But if it is run on a 16-core CPU, if the division error of a certain task is 0.5 time units, then the acceleration coefficient becomes 16/(1 0.5) = 10.67, which is only 66.7% of the ideal acceleration coefficient of 16 , if the number of cores increases further, the ratio of the acceleration coefficient to the ideal acceleration coefficient will decrease due to the amplification of errors.
The problem of load division is also reflected in the upgrade of CPU and software. For example, the load division on a 4-core CPU is balanced, but on an 8-core or 16-core CPU, the load may become It's unbalanced. The same goes for software upgrades. When the software adds functions, the load balance will be destroyed, and the load needs to be re-divided to achieve balance. This greatly increases the difficulty and trouble of software design.
If locks are used, some seemingly balanced loads may become unbalanced due to lock competition.
4. Load balancing strategies
For software with a small amount of calculation, it will run very fast even if it is placed on a single-core CPU, and the load balancing is done well The difference does not have much impact. In actual load balancing, software with a large amount of calculation and large scale needs to be considered. These software need to be load balanced on multiple cores to make better use of multiple cores to improve performance.
For large-scale software, the response strategy adopted in load balancing is to develop a macro-partitioning method of dividing parallel blocks, and divide it from the entire software system level, rather than targeting certain parts as in the traditional method. Programs and algorithms are used for parallel decomposition, because it is usually difficult to decompose local programs into more than dozens of tasks to run.
Another coping strategy is at the tool level, that is, compilation tools can assist manual decomposition of parallel blocks and find good decomposition solutions. Intel has made some efforts in this regard, but more efforts are needed to make the tools The function is more powerful to cope with the situation when the number of cores is large.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



According to news from this website on July 28, foreign media TechRader reported that Fujitsu introduced in detail the FUJITSU-MONAKA (hereinafter referred to as MONAKA) processor planned to be shipped in 2027. MONAKACPU is based on the "cloud native 3D many-core" architecture and adopts the Arm instruction set. It is oriented to the data center, edge and telecommunications fields. It is suitable for AI computing and can realize mainframe-level RAS1. Fujitsu said that MONAKA will achieve a leap in energy efficiency and performance: thanks to technologies such as ultra-low voltage (ULV) technology, the CPU can achieve 2 times the energy efficiency of competing products in 2027, and cooling does not require water cooling; in addition, the application performance of the processor It can also reach twice as much as your opponent. In terms of instructions, MONAKA is equipped with vector

IntelArrowLakeisexpectedtobebasedonthesameprocessorarchitectureasLunarLake,meaningthatIntel'sbrandnewLionCoveperformancecoreswillbecombinedwiththeeconomicalSkymontefficiencycores.WhileLunarLakeisonlyavailableasava

13th and 14th generation processors experienced game crashes, blue screens of death, automatic computer restarts and other failures. It was previously suspected to be caused by the nvidia graphics card. After nvidia inquired, it was the fault of the Intel processor. Recently, Intel blamed the stability issues of the 13th/14th generation processors. For motherboard and BIOS system manufacturers. Now Intel has also proposed a solution. Let’s take a look with the editor below. It is possible that the setting options in the BIOS of the 600 and 700 series motherboards involving the voltage, frequency, power consumption and stability of the 13th and 14th generation Core processors are incorrectly set, or the setting values are outside the range officially allowed by Intel. It will cause or increase the risk of unstable processor operation. Intel’s recommended settings are as follows (see the figure below): [C

If you purchased the MagicX XU Mini M recently, this news might come as a surprise. A hardware and software teardown of the newly released handheld console revealed that the advertised RK3562 CPU is, in fact, a lower-specced, older RK3326 processor.

Inteli5-12600 and above CPUs, i5-13400 and above CPUs have P-Core performance cores (large cores) and E-Core energy efficiency cores (small cores). Due to the scheduling problem of "big and small cores", some games may drop frames. , lag, not as good as the old CPU before. In fact, the system thinks that to cope with the current scene, the younger one can handle it, and there is no need to dispatch the older one, so the older core has been resting and not working. The editor below will teach you how to solve this problem. Create a new text document on the desktop, copy the following content, save it as 1.reg, and then right-click to merge. WindowsRegistryEditorVersion5.00[HKEY_LOCAL_MACHINE\SY

According to news from this site on August 22, X platform user 포시포시 (@harukaze5719) noticed that Intel listed two adapter boards suitable for LGA9324-OKS-AP platform power supply testing on its official website DESIGN-iNTOOLSstore. ▲BLU version adapter board, in addition to RED version Intel wrote in the description of these two products that the LGA9324-OKS-APOakStream platform supports DiamondRapids, which positively confirms the next-generation Xeon performance core after Xeon 6 "GraniteRapids" The existence of processors and corresponding platforms. Current information on DiamondRapids processors and OakStream platforms

According to news from this website on July 3, blogger Jinzhu Upgrade Package recently stated in a response below the Weibo update that AMD KrackanPoint processor is part of the Ryzen AI300 product line and will be launched next year. AMD released the RyzenAI300 processor at the 2024 Taipei International Computer Show. The two products currently launched are based on the StrixPoint series. The core specifications are as follows: AMD RyzenAI9HX370: 12-core (4×Zen5+8×Zen5c) CPU, 16CU scale RDNA3.5 architecture Radeon890M core display; AMD RyzenAI9365: 10-core (4×Zen5+6×Zen5c) CPU, 12CU scale

According to news from this site on July 29, based on the news of X platform source HXL (@9950pro) and the foreign media Tom'sHardware article, the top cover silk screen error is one of the reasons why AMD delayed the release of Ryzen 9000 desktop processor. It is not yet confirmed whether the postponement of this sale is also related to other issues. AMD previously stated in its announcement that during the final inspection process, it was found that the first batch of production units sent to channel partners did not fully meet AMD's quality expectations, and the retail supply of Ryzen 9000 series desktop processors will be briefly delayed. Among them, the Ryzen 79700X and Ryzen 59600X processors will be available on August 8, and the Ryzen 99950X and Ryzen 99900X processors will be available on August 15. HXL
