This article brings you a comprehensive analysis of the synchronization primitives in the operating system. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
In a general operating system, different processes may share a common storage area, such as memory or files on the hard disk. These processes are allowed to run in these areas. Read and write on.
The operating system has some responsibilities to coordinate the processes using the common area to perform the desired operations in the correct way. These processes need to communicate with each other and have discussions to ensure that the actions of one process will not affect the normal actions of another process, which will result in the process not getting the expected results after running. In the concept of operating systems, the term IPC (Inter Process Communication) is usually used to represent communication between multiple processes.
In order to explain what a race condition is, we introduce a simple example to illustrate:
A number n is stored in a file, and both process A and process B want to read this file. number, add 1 to this number, and save it back to the file. Assuming that the initial value of n is 0, in our ideal case, after process A and process B run, the value of n in the file should be 2, but in fact it may happen that the value of n is 1. We can consider what steps each process needs to go through when doing this:
Read the value of n in the file
Let n = n 1
Save the new n value back to the file.
Before further explaining race conditions, we must first review several knowledge points in operating system concepts:
Processes can run concurrently (even when there is only one CPU)
The clock interrupt of the operating system will cause the process to be rescheduled.
In addition to the clock interrupt, Interrupts from other devices will also cause the process to be rescheduled
Assume that a clock interrupt occurs when process A has finished running steps 1 and 2 but has not yet started running step 3. , at this time, the operating system schedules process B to start running. When process B runs step 1, it finds that the value of n is 0, so it runs steps 2 and 3, and eventually saves n = 1 to a file. When process A continues to run later, because it does not know that process B has modified the value in the file before it runs step 3, process A will also write n = 1 back to the file. This is the problem. While process A is running, other processes will operate on the data it operates.
The only way to make n = 2 is to expect process A and process B to complete all steps in order.
We can define a race condition.
Two or more processes read and write certain shared data, and the final result depends on the exact timing of the process running, which is called a race condition.
In the above text, we use processes as objects to discuss race conditions. In fact, the same applies to threads. The threads here include but are not limited to kernel threads and user threads. Because in the operating system, processes actually rely on threads to run programs. What's more, in the thread safety of the Java language, the concept of race conditions also applies. (Refer to Chapter 2 of "Java Concurrent Programming Practice")
How to avoid race conditions, we need to use some means to ensure that when a process is using a shared variable Or files, other processes cannot do the same operation. In other words, we need "mutual exclusion".
Taking the above example as an example, we can define the program fragment in steps 1-3 as a critical section. The critical section means that this area is sensitive, because once the process runs into this area, it means that public data will be Operating an area or file also means that other processes may be running into the critical section. If appropriate measures are taken so that the two processes are not in critical sections at the same time, race conditions can be avoided.
In other words, we need to think about how to achieve "mutual exclusion".
The essence of mutual exclusion is to prevent multiple processes from entering the critical section at the same time
In the example mentioned before, the reason why process B was able to enter the critical section is because process A was interrupted in the critical section. If we ask process A to mask all interrupts immediately after entering the critical section, and only respond to interrupts after leaving the critical section, then even if an interrupt occurs, the CPU will not switch to other processes, so process A can safely Modify the contents of the file without worrying that other processes will interfere with its work.
However, this idea is beautiful, but in fact it is not feasible. First, if there are multiple CPUs, process A cannot mask interrupts to other CPUs. It can only mask the CPU that is scheduling it, so processes scheduled by other CPUs can still enter the critical section; second, about power Question, can the power of blocking interrupts be given to the user process? If this process masks interrupts and never responds to interrupts, then a process may hang the entire operating system.
Maybe you can set a lock flag to set its initial value to 0. When a process wants to enter the critical section, first check whether the value of the lock is 0. If If it is 0, set it to 1, then enter the critical section, and change the value of the lock to 0 after exiting the critical section; if the value of the lock is already 1 when checking, it means that other processes are already in the critical section, so the process loops Wait and keep checking the lock value until it becomes 0.
But there is also a race condition in this method. The reason is that when a process reads the value of the lock as 0, before it sets its value to 1, another process is scheduled and it also reads The value of the lock is 0, so there is a situation where both processes are in the critical section at the same time.
The reason why there is a problem with the lock variable is actually because the action of changing the lock variable from 0 to 1 is performed by the process that wants to acquire the lock. If we change this action to be performed by the process that has obtained the lock, then there will be no race condition.
First set a variable turn, which represents who is currently allowed to obtain the lock. Assume there are two processes. The logic of process A is as follows:
while (turn != 0){// 如果还没轮到自己获取锁,则进入循环等待 } do_critical_region();// 执行临界区的程序 turn = 1;// 由获得锁的一方将锁变量修改为其它值,允许其它进程获得锁 do_non_critical_region();// 执行非临界区的程序
The logic of process B is as follows:
while (turn != 1) {// 如果还没轮到自己获取锁,则进入循环等待 } do_critical_region();// 执行临界区的程序 turn = 0;// 由获得锁的一方将锁变量修改为其它值,允许其它进程获得锁 do_non_critical_region();// 执行非临界区的程序
But one thing needs to be taken into consideration here. Suppose that do_non_critical_region() of process A needs to be executed for a long time, that is, the logic of the non-critical region of process A needs to be executed for a long time, while the logic of the non-critical region of process B will be executed quickly. After execution, it is obvious that process A will enter the critical section less frequently than process B. Ideally, process B should enter the critical section several times more. However, because process B will set turn to 0 before executing the logic of the non-critical section, after it quickly executes the logic of the non-critical section, when it comes back to check the value of turn, it is found that the value of turn has never been 1. The value requires process A to set it to 1, but process A is currently running a long non-critical section logic code, so process B cannot enter the critical section.
This shows that strict rotation is not a good idea when one process is much slower than another process.
The problem with the strict rotation method lies in the word strict, that is, multiple processes enter the critical section in turns. The fundamental reason is the process that wants to obtain the lock. It needs to rely on other processes to modify the lock variable, and other processes must first execute the non-critical section logic before modifying the lock variable.
The turn variable in the strict rotation method is not only used to indicate whose turn it is to acquire the lock, but also before its value changes, it means that it still prevents other processes from entering the critical section. It just so happens that a process always The value of turn will be changed only after passing through the logic of the non-critical section.
Therefore, we can use two variables to represent it. One variable represents whose turn it is to obtain the lock currently, and the other variable represents that the current process has left the critical section. This method is actually called the Peterson algorithm, which was developed by Dutch mathematics Proposed by T. Dekker.
static final int N = 2; int turn = 0; boolean[] interested = new boolean[N]; void enter_region(int process) { int other = 1 - process; interested[process] = true; turn = process; while(turn == process && !interested[other]) { } } void exit_region(int process) { interested[process] = false; }
When the process needs to enter the critical section, it first calls enter_region, and after leaving the critical section, it calls exit_region. Peterson's algorithm causes the process to immediately eliminate its "interest" in the critical section after leaving the critical section, so other processes can judge whether they can legally enter the critical section based on the value of turn.
Recall the "lock variable" method we mentioned before. One of its fatal shortcomings is when changing the state variable, such as from 0 to 1 or from 1 When changed to 0, it can be interrupted by interrupts, so there is a race condition.
Later, on the basis of the lock variable, we proposed that if the modification of the lock variable is not modified by the process that wants to enter the critical section, but by the process that wants to leave the critical section after entering the critical section, then Race conditions can be avoided, which leads to the strict rotation method and the Peterson algorithm improved on the basis of the strict rotation method. These methods are all considered from a software perspective. In fact, with the support of hardware CPU, it is possible to ensure that changes in lock variables are not interrupted, making lock variables a good method to solve process mutual exclusion.
Most of the current computer CPUs support the TSL instruction, which is called Test and Set Lock. It reads a memory variable (word) into the register RX, and then stores a non-zero value at the memory address. , read operations and write operations are guaranteed to be uninterruptible at the hardware level, that is to say, atomic. The method it uses is to lock the memory bus when executing the TSL instruction, prohibiting other CPUs from accessing the memory before the TSL instruction ends. This is also what we often call CAS (Compare And Set).
When you need to change the lock variable from 0 to 1, first copy the memory value to the register, set the memory value to 1, and then check whether the register value is 0. If it is 0, the operation is successful. If it is non-0, repeat the detection until the register value becomes 0. If the register value becomes 0, it means that another process has left the critical section. When the process leaves the critical section, it needs to set the value of this variable in memory to 0.
The Peterson algorithm and TSL method mentioned above actually have one characteristic, that is, when waiting to enter the critical section, they use busy waiting. Way, we also often call it spin. Its disadvantage is that it wastes CPU time slices and can cause priority inversion problems.
Consider a computer with two processes, H has a higher priority and L has a lower priority. The scheduling rules state that as long as H is in the ready state, it can run. At a certain moment, L is in the critical region, and H is in a ready state and ready to run. But H needs to be busy waiting, and L cannot be scheduled due to its low priority, and therefore cannot leave the critical section. Therefore, H will be busy waiting forever, and L can never leave the critical section. This situation is called priority inversion problem(priority inversion problem)
The operating system provides some primitives, sleep and wakeup .
The process or function provided by the kernel for calls outside the core becomes a primitive, and primitives do not allow interruption during execution.
sleep is a system call that blocks the calling process until another process calls the wakeup method, taking the blocked process as a parameter to wake it up. The biggest difference between blocking and busy waiting is that after the process is blocked, the CPU will not allocate a time slice to it, while busy waiting is always idling and consuming the CPU's time slice.
The first thing you need to understand is what problem the semaphore appears to solve. Since the blocking and awakening of a process are caused in different processes, for example, process A calls sleep() will enter blocking, and process B's call to wakeup(A) will wake up process A. Because it is carried out in different processes, there is also the problem of interruption. According to the logic, when joining process A, it needs to call sleep() to enter the blocking state. However, just before it calls the sleep method, due to the clock interruption, process B starts running. According to the logic, it calls the wakeup() method to wake up process A, but Since process A has not yet entered the blocking state, the wakeup signal is lost. When process A continues to run from the previously interrupted position and enters blocking, no process may no longer wake it up.
Therefore, the blocking and awakening of the process should require the introduction of an additional variable to record. This variable records the number of awakenings. Each time it is awakened, the value of the variable is increased by 1. With this variable, even if the wakeup operation precedes the sleep operation, the wakeup operation will be recorded in the variable. When the process sleeps, because other processes have already woken up, it is considered that the process does not need to enter the blocking state.
This variable, in the operating system concept, is called a semaphore, a method proposed by Dijkstra in 1965.
There are two operations on semaphores, down and up.
The down operation actually corresponds to sleep. It will first detect whether the value of the semaphore is greater than 0. If it is greater than 0, then decrement it by 1. The process does not need to block at this time, which is equivalent to consuming a wakeup; if the semaphore is 0 , the process will enter a blocking state.
The up operation corresponds to wakeup. After the up operation, if a process is found to be blocked on this semaphore, the system will select one of the processes to wake it up. At this time, the value of the semaphore does not need to change, but it is blocked. There is already one less process; if no process is blocked on the semaphore during the up operation, it will increase the value of the semaphore by 1.
In some places, the down and up operations are called PV operations. This is because in Dijkstra's paper, P and V are used to represent down and up respectively.
The down and up operations of semaphores are primitives supported by the operating system. They are atomic operations and will not be interrupted.
Mutex (mutex) is actually a special case of semaphore. Its value is only 0 and 1. When we do not need to use the counting ability of semaphore , we can use a mutex, which actually means that the critical section value only allows one process to enter at the same time, while the semaphore allows multiple processes to enter the critical section at the same time.
The above is the detailed content of Comprehensive analysis of synchronization primitives in operating systems. For more information, please follow other related articles on the PHP Chinese website!