Well-known kernel threads in Linux systems (1)——ksoftirqd and events
—— lvyilong316
We know that there are many kernel threads (kthreads) created by the system in the Linux system. These kernel threads are the guarantee for the normal operation of the system. Here we take a look at two of the more well-known ones: ksoftirqd and events.
When it comes to ksoftirqd, we have to say "softirq", because this thread is used to execute softirqs (to be precise, it should be executed multiple softirqs). We know that in terms of priority, interrupts > soft interrupts > user processes, which means interrupts can interrupt soft interrupts, and soft interrupts can interrupt user processes.
As for soft interrupts, the kernel will execute them at several special times (note the difference between execution and scheduling. Scheduling soft interrupts only marks the soft interrupts as to be executed and does not actually execute them). Handling when the handler returns is most common. Softirqs can sometimes be triggered very frequently (such as during heavy network traffic). What's even more unfavorable is that the execution function of soft interrupts sometimes schedules itself, so if soft interrupts themselves occur more frequently, and they have the ability to reset themselves to an executable state, it will lead to user space 's process cannot get enough processing time and is starved. To avoid starvation of user processes. The kernel developers made some compromises. In the end, the kernel implementation will not immediately handle soft interrupts that are retriggered by the soft interrupt itself (soft interrupt nesting is not allowed). As an improvement, the kernel will wake up a group of kernel threads to handle these excessive soft interrupts. These kernel threads run at the lowest priority (nice value is 19), which prevents them from competing for resources with other important tasks, but they It will definitely be executed eventually, so this solution can ensure that when the soft interrupt load is heavy, the user process will not be starved because it cannot get processing time. Correspondingly, it can also ensure that excessive soft interrupts will eventually be processed.
Each processor has one such thread. The names of all threads are called ksoftirq/n. The difference is n, which corresponds to the processor number.
Let’s take a detailed look at how softirqd is executed by ksoftirqd. First, let’s look at the processing and scheduling process of soft interrupts. A softirq must be scheduled (activated) before execution, the term is called "raisethesoftirq". The activated softirq is usually not executed immediately. It will usually check whether there is a pending softirq in the current system at a later time. If there is, execute it. The function for executing softirq in Linux is do_softirq(), and this function It will be called in two places, one is when the interrupt returns, and the other is the ksoftirqd kernel thread we discussed. Let's first look at the situation of interrupt return.
//This function is called when the do_IRQ function exits after executing the hardware ISR.
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>void irq_exit(void) <br /></li><li>{ <br /></li><li>account_system_vtime(current); <br /></li><li>trace_hardirq_exit(); <br /></li><li>sub_preempt_count(IRQ_EXIT_OFFSET); //这个位置修改preempt_count<br /></li><li>// 判断当前是否有硬件中断嵌套,并且是否有软中断在 pending 状态,注意:这里只有两个条件同时满足时,才有可能调用 do_softirq() 进入软中断。也就是说确认当前所有硬件中断处理完成,且有硬件中断安装了软中断处理时理时才会进入。关于in_interrupt()后面会详细分析。<br /></li><li>if (!in_interrupt() && local_softirq_pending()) <br /></li><li>// 其实这里就是调用 do_softirq() 执行 <br /></li><li>invoke_softirq(); <br /></li><li>preempt_enable_no_resched(); <br /></li><li>}</li></ol>
Here we need to focus on analyzing the meaning of the in_interrupt() function. In the Linux kernel, in order to facilitate the judgment of which context the current execution path is in, several interfaces are defined:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>#define hardirq_count() (preempt_count() & HARDIRQ_MASK)<br /></li><li>#define softirq_count() (preempt_count() & SOFTIRQ_MASK)<br /></li><li>#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))<br /></li><li>/*<br /></li><li>* Are we doing bottom half or hardware interrupt processing?<br /></li><li>* Are we in a softirq context? Interrupt context?<br /></li><li>*/<br /></li><li>#define in_irq() (hardirq_count())<br /></li><li>#define in_softirq() (softirq_count())<br /></li><li>#define in_interrupt() (irq_count())<br /></li><li>/*<br /></li><li>* Are we in NMI context?<br /></li><li>*/<br /></li><li>#define in_nmi() (preempt_count() & NMI_MASK)</li></ol>
From the comments, you can It can be seen that it includes: hardware interrupt context, software interrupt context, non-maskable context, etc. Among these macros, the preempt_count() macro is involved. This macro is a relatively important macro. It is commented in detail in the Linux source code:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>/*<br /></li><li>* We put the hardirq and softirq counter into the preemption<br /></li><li>* counter. The bitmask has the following meaning:<br /></li><li>*<br /></li><li>* - bits 0-7 are the preemption count (max preemption depth: 256)<br /></li><li>* - bits 8-15 are the softirq count (max # of softirqs: 256)<br /></li><li>*<br /></li><li>* The hardirq count can in theory reach the same as NR_IRQS.<br /></li><li>* In reality, the number of nested IRQS is limited to the stack<br /></li><li>* size as well. For archs with over 1000 IRQS it is not practical<br /></li><li>* to expect that they will all nest. We give a max of 10 bits for<br /></li><li>* hardirq nesting. An arch may choose to give less than 10 bits.<br /></li><li>* m68k expects it to be 8.<br /></li><li>*<br /></li><li>* - bits 16-25 are the hardirq count (max # of nested hardirqs: 1024)<br /></li><li>* - bit 26 is the NMI_MASK<br /></li><li>* - bit 28 is the PREEMPT_ACTIVE flag<br /></li><li>*<br /></li><li>* PREEMPT_MASK: 0x000000ff<br /></li><li>* SOFTIRQ_MASK: 0x0000ff00<br /></li><li>* HARDIRQ_MASK: 0x03ff0000<br /></li><li>* NMI_MASK: 0x04000000<br /></li><li>*/</li></ol>
As can be seen from the comments, the meaning of each bit of preempt_count:
(1) Bits 0~7 represent the preemption count, that is, the maximum preemption depth supported is 256
(2) bit8~15 represents the soft interrupt count, that is, the maximum number of soft interrupts supported is 256. It should be noted that since the soft interrupt is also subject to the pending status, a 32-bit variable, the actual maximum Only 32 soft interrupts can be supported.
(3) bit16~25 indicates the number of nesting levels of hardware interrupts, that is, the maximum supported nesting level is 1024. In actual circumstances, this is impossible because the number of nesting levels of interrupts is still limited. The size of the stack space for interrupt processing.
Having introduced so much, let’s focus on analyzing what does in_interrupt mentioned above mean?
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>#define in_interrupt() (irq_count())<br /></li><li>#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \<br /></li><li>| NMI_MASK))</li></ol>
As can be seen from its macro definition, the value of the in_interrupt macro is the number of hardware interrupt nesting levels, soft interrupt count and maskable interrupt. sum. So if the value of in_interrupt() is greater than 0, the soft interrupt will not be processed, which means that (a) when there is a hardware interrupt nesting, (b) or the soft interrupt is disabled (c) when the interrupt is not maskable, it will not be processed. Handle softirqs. Some people may ask whether soft interrupts are entered from irq_exit after interrupt processing? Hasn't the hard interrupt bit of preempt_count not been modified when the soft interrupt is executed? In fact, the modification has been made, and it is done in sub_preempt_count in the irq_exit function. In fact, if sub_preempt_count is executed, the interrupt handler is exited.
l注:软中断被禁止会增加软中断的计数;
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>__local_bh_disable((unsigned long)__builtin_return_address(0));<br /></li><li>static inline void __local_bh_disable(unsigned long ip)<br /></li><li>{<br /></li><li>add_preempt_count(SOFTIRQ_OFFSET);<br /></li><li>barrier();<br /></li><li>}<br /></li><li># define add_preempt_count(val) do { preempt_count() += (val); } while (0)</li></ol>
下面重点分析以下do_softirq(),了解Linux内核到底是怎么来处理softirq的。
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>asmlinkage void do_softirq(void)<br /></li><li>{<br /></li><li>__u32 pending;<br /></li><li> unsigned long flags;<br /></li><li>// 这个函数判断,如果当前有硬件中断嵌套,或者软中断被禁止时,则马上返回。在这个入口判断主要是为了与 ksoftirqd 互斥。<br /></li><li>if (in_interrupt())<br /></li><li> return;<br /></li><li>// 关中断执行以下代码<br /></li><li> local_irq_save(flags); <br /></li><li>// 判断是否有 pending 的软中断需要处理。<br /></li><li> pending = local_softirq_pending();<br /></li><li>// 如果有则调用 __do_softirq() 进行实际处理<br /></li><li>if (pending)<br /></li><li> __do_softirq();<br /></li><li>// 开中断继续执行<br /></li><li> local_irq_restore(flags);<br /></li><li>}</li></ol>
注意调用local_softirq_pending()获取pending和将pending请0这两个操作一定要位于关中断情况,否则两个操作间可能发生中断,中断再次调度软中段的置位标记会丢失。
真正的软中断处理再__do_softirq中。
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>// 最大软中断调用次数为 10 次。<br /></li><li>#define MAX_SOFTIRQ_RESTART 10<br /></li><li>asmlinkage void __do_softirq(void)<br /></li><li>{<br /></li><li>// 软件中断处理结构,此结构中包括软中断回调函数。<br /></li><li> struct softirq_action *h;<br /></li><li> __u32 pending;<br /></li><li>int max_restart = MAX_SOFTIRQ_RESTART;<br /></li><li>int cpu;<br /></li><li>// 得到当前所有 pending 的软中断。<br /></li><li> pending = local_softirq_pending();<br /></li><li> account_system_vtime(current);<br /></li><li>// 执行到这里要禁止其他软中断,这里也就证明了每个 CPU 上同时运行的软中断只能有一个。<br /></li><li> __local_bh_disable((unsigned long)__builtin_return_address(0));<br /></li><li> trace_softirq_enter();<br /></li><li>// 针对 SMP 得到当前正在处理的 CPU<br /></li><li> cpu = smp_processor_id();<br /></li><li>restart:<br /></li><li>// 每次循环在允许硬件 中断强占前,首先重置软中断的标志位。<br /></li><li>/* Reset the pending bitmask before enabling irqs */<br /></li><li> set_softirq_pending(0); //要在关中断情况下才能调用<br /></li><li> // 到这里才开中断运行,注意:以前运行状态一直是关中断运行,这时当前处理软中断才可能被硬件中断抢占。也就是说在进入软中断时不是一开始就会被硬件中断抢占。只有在这里以后的代码才可能被硬件中断抢占。<br /></li><li> local_irq_enable();<br /></li><li> // 这里要注意,以下代码运行时可以被硬件中断抢占,但这个硬件中断执行完成后,它的所注册的软中断无法马上运行,别忘了,现在虽是开硬件中断执行,但前面的 __local_bh_disable()函数屏蔽了软中断。所以这种环境下只能被硬件中断抢占,但这个硬中断注册的软中断回调函数无法运行。要问为什么,那是因为__local_bh_disable() 函数设置了一个标志当作互斥量,而这个标志正是上面的 irq_exit() 和 do_softirq() 函数中的in_interrupt() 函数判断的条件之一,也就是说 in_interrupt() 函数不仅检测硬中断而且还判断了软中断。所以在这个环境下触发硬中断时注册的软中断,根本无法重新进入到这个函数中来,只能是做一个标志,等待下面的重复循环(最大 MAX_SOFTIRQ_RESTART)才可能处理到这个时候触发的硬件中断所注册的软中断。得到软中断向量表。<br /></li><li> h = softirq_vec;<br /></li><li> // 循环处理所有 softirq 软中断注册函数。</li><li> do {</li><li>// 如果对应的软中断设置 pending 标志则表明需要进一步处理它所注册的函数。<br /></li><li>if (pending & 1) {<br /></li><li>// 在这里执行了这个软中断所注册的回调函数。<br /></li><li> h->action(h);<br /></li><li> rcu_bh_qsctr_inc(cpu);<br /></li><li>}<br /></li><li>// 继续找,直到把软中断向量表中所有 pending 的软中断处理完成。<br /></li><li> h++;<br /></li><li> // 从代码里可以看出按位操作,表明一次循环只处理 32 个软中断的回调函数。<br /></li><li> pending >>= 1; <br /></li><li>} while (pending);<br /></li><li>// 关中断执行以下代码。注意:这里又关中断了,下面的代码执行过程中硬件中断无法抢占。<br /></li><li> local_irq_disable();<br /></li><li>// 前面提到过,在刚才开硬件中断执行环境时只能被硬件中断抢占 ,在这个时候是无法处理软中断的,因为刚才开中断执行过程中可能多次被硬件中断抢占,每抢占一次就有可能注册一个软中断,所以要再重新取一次所有的软中断。以便下面的代码进行处理后跳回到 restart 处重复执行。 <br /></li><li> pending = local_softirq_pending(); <br /></li><li>// 如果在上面的开中断执行环境中触发了硬件中断,且注册了一个软中断的话,这个软中断会设置 pending 位,但在当前一直屏蔽软中断的环境下无法得到执行,前面提到过,因为 irq_exit() 和 do_softirq() 根本无法进入到这个处理过程中来。这个在上面周详的记录过了。那么在这里又有了一个执行的机会。注意:虽然当前环境一直是处于屏蔽软中断执行的环境中,但在这里又给出了一个执行刚才在开中断环境过程中触发硬件中断时所注册的软中断的机会,其实只要理解了软中断机制就会知道,无非是在一些特定环境下调用 ISR 注册到软中断向量表里的函数而已。如果刚才触发的硬件中断注册了软中断,并且重复执行次数没有到 10 次的话,那么则跳转到 restart 标志处重复以上所介绍的所有步骤:设置软中断标志位,重新开中断执行... <br /></li><li>// 注意:这里是要两个条件都满足的情况下才可能重复以上步骤。 <br /></li><li>if (pending && --max_restart) <br /></li><li> goto restart; <br /></li><li>// 如果以上步骤重复了 10 次后还有 pending 的软中断的话,那么系统在一定时间内可能达到了一个峰值,为了平衡这点。系统专门建立了一个 ksoftirqd 线程来处理,这样避免在一 定时间内负荷太大。这个 ksoftirqd 线程本身是个大循环,在某些条件下为了不负载过重,他是能被其他进程抢占的,但注意,他是显示的调用了 preempt_xxx() 和 schedule()才会被抢占和转换的。这么做的原因是因为在他一旦调用 local_softirq_pending() 函数检测到有 pending 的软中断需要处理的时候,则会显示的调用 do_softirq() 来处理软中 断。也就是说,下面代码唤醒的 ksoftirqd 线程有可能会回到这个函数当中来,尤其是在系统需要响应非常多软中断的情况下,他的调用入口是 do_softirq(),这也就是为什么在 do_softirq() 的入口处也会用 in_interrupt() 函数来判断是否有软中断正在处理的原因了,目的还是为了防止重入。ksoftirqd 实现看下面对 ksoftirqd() 函数的分析。 <br /></li><li>if (pending) <br /></li><li>// 此函数实际是调用 wake_up_process() 来唤醒 ksoftirqd <br /></li><li> wakeup_softirqd(); <br /></li><li>trace_softirq_exit(); <br /></li><li>account_system_vtime(current); <br /></li><li>// 到最后才开软中断执行环境,允许软中断执行。注意:这里使用的不是 local_bh_enable(),不会再次触发 do_softirq()的调用。 <br /></li><li>_local_bh_enable(); <br /></li><li>}</li></ol>
这个函数就是ksoftirqd内核线程对应的执行函数。只要有待处理的软中断(由softirq_pending()函数负责发现),ksoftirq就会调用do_softirq()去处理它们。通过重复执行这样的操作,重新触发的软中断也会被执行。如果有必要的话,每次迭代后都会调用schedule()以便让更重要的进程得到处理机会。当所有需要执行的操作都完成以后,该内核线程将自己设置为TASK_INTERTUPTIBLE状态,唤起调度程序选择其他可执行的进程投入运行。
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>static int ksoftirqd(void * __bind_cpu)<br /></li><li>{<br /></li><li>// 显示调用此函数设置当前进程的静态优先级。当然,这个优先级会随调度器策略而变化。<br /></li><li>set_user_nice(current, 19);<br /></li><li>// 设置当前进程不允许被挂启<br /></li><li>current->flags |= PF_NOFREEZE;<br /></li><li>//设置当前进程状态为可中断的状态,这种睡眠状态可响应信号处理等。<br /></li><li>set_current_state(TASK_INTERRUPTIBLE);<br /></li><li>// 下面是个大循环,循环判断当前进程是否会停止,不会则继续判断当前是否有 pending 的软中断需要处理。<br /></li><li>while (!kthread_should_stop()) {<br /></li><li> // 如果能进行处理,那么在此处理期间内禁止当前进程被抢占。<br /></li><li> preempt_disable();<br /></li><li> // 首先判断系统当前没有需要处理的 pending 状态的软中断<br /></li><li> if (!local_softirq_pending()) {<br /></li><li>// 没有的话在主动放弃 CPU 前先要允许抢占,因为一直是在不允许抢占状态下执行的代码。<br /></li><li>preempt_enable_no_resched();<br /></li><li>// 显示调用此函数主动放弃 CPU 将当前进程放入睡眠队列,并转换新的进程执行(调度器相关不记录在此)<br /></li><li>schedule();<br /></li><li>// 注意:如果当前显示调用 schedule() 函数主动转换的进程再次被调度执行的话,那么将从调用这个函数的下一条语句开始执行。也就是说,在这里当前进程再次被执行的话,将会执行下面的 preempt_disable() 函数。当进程再度被调度时,在以下处理期间内禁止当前进程被抢占。<br /></li><li>preempt_disable();<br /></li><li>}<br /></li><li>/*设置当前进程为运行状态。注意:已设置了当前进程不可抢占在进入循环后,以上两个分支不论走哪个都会执行到这里。一是进入循环时就有 pending 的软中断需要执行时。二是进入循环时没有 pending 的软中断,当前进程再次被调度获得 CPU 时继续执行时。*/<br /></li><li>__set_current_state(TASK_RUNNING);<br /></li><li>/* 循环判断是否有 pending 的软中断,如果有则调用 do_softirq()来做具体处理。注意:这里又是个 do_softirq() 的入口点,那么在 __do_softirq() 当中循环处理 10 次软中断的回调函数后,如果更有 pending 的话,会又调用到这里。那么在这里则又会有可能去调用 __do_softirq() 来处理软中断回调函数。在前面介绍 __do_softirq() 时已提到过,处理 10 次还处理不完的话说明系统正处于繁忙状态。根据以上分析,我们能试想如果在系统非常繁忙时,这个进程将会和 do_softirq() 相互交替执行,这时此进程占用 CPU 应该会非常高,虽然下面的 cond_resched()函数做了一些处理,他在处理完一轮软中断后当前处理进程可能会因被调度而减少 CPU 负荷,不过在非常繁忙时这个进程仍然有可能大量占用 CPU。*/<br /></li><li>while (local_softirq_pending()) {<br /></li><li>/* Preempt disable stops cpu going offline. If already offline, we’ll be on wrong CPU: don’t process */<br /></li><li>if (cpu_is_offline((long)__bind_cpu))<br /></li><li>/*如果当前被关联的 CPU 无法继续处理则跳转到 wait_to_die 标记出,等待结束并退出。*/<br /></li><li>goto wait_to_die;<br /></li><li>/*执行 do_softirq() 来处理具体的软中断回调函数。注意:如果此时有一个正在处理的软中断的话,则会马上返回,还记得前面介绍的 in_interrupt() 函数么。*/<br /></li><li>do_softirq();<br /></li><li>/*允许当前进程被抢占。*/<br /></li><li>preempt_enable_no_resched();<br /></li><li>/*这个函数有可能间接的调用 schedule() 来转换当前进程,而且上面已允许当前进程可被抢占。也就是说在处理完一轮软中断回调函数时,有可能会转换到其他进程。我认为这样做的目的一是为了在某些负载超标的情况下不至于让这个进程长时间大量的占用 CPU,二是让在有非常多软中断需要处理时不至于让其他进程得不到响应。*/<br /></li><li> cond_resched();<br /></li><li> /* 禁止当前进程被抢占。*/<br /></li><li>preempt_disable();<br /></li><li>/* 处理完所有软中断了吗?没有的话继续循环以上步骤*/<br /></li><li> }<br /></li><li>/*待一切都处理完成后,允许当前进程被抢占,并设置当前进程状态为可中断状态,继续循环以上所有过程。*/<br /></li><li> preempt_enable();<br /></li><li> set_current_state(TASK_INTERRUPTIBLE);<br /></li><li>}<br /></li><li>/*如果将会停止则设置当前进程为运行状态后直接返回。调度器会根据优先级来使当前进程运行。*/<br /></li><li>__set_current_state(TASK_RUNNING);<br /></li><li>return 0;<br /></li><li>/*一直等待到当前进程被停止*/</li><li>wait_to_die:</li><li>/*允许当前进程被抢占。*/<br /></li><li>preempt_enable();<br /></li><li>/* Wait for kthread_stop */<br /></li><li>/*设置当前进程状态为可中断的状态,这种睡眠状态可响应信号处理等。*/<br /></li><li>set_current_state(TASK_INTERRUPTIBLE);<br /></li><li>/*判断当前进程是否会被停止,如果不是的话则设置进程状态为可中断状态并放弃当前 CPU主动转换。也就是说这里将一直等待当前进程将被停止时候才结束。*/<br /></li><li>while (!kthread_should_stop()) {<br /></li><li>schedule();<br /></li><li>set_current_state(TASK_INTERRUPTIBLE);<br /></li><li>}<br /></li><li>/*如果将会停止则设置当前进程为运行状态后直接返回。调度器会根据优先级来使当前进程运行。*/<br /></li><li>__set_current_state(TASK_RUNNING);<br /></li><li>return 0;<br /></li><li>}</li></ol>
最后说明一下,因为tasklet也是通过软中断实现的,所以tasklet过多也会导致ksoftirqd线程的调度,进而再进程上下文中执行tasklet。(ksoftirqd执行软中断处理程序,tasklet对应的软中断处理程序执行所有调度的tasklet)
下面看events线程,提到这个线程就不得不说道“工作队列(workqueue)”了,这个线程是就是工作队了用来执行队列中的工作的。
Workqueue也是linux下半部(包括软中断、tasklet、工作队列)实现的一种方式。Linux中的Workqueue机制就是为了简化内核线程的创建。通过调用workqueue的接口就能创建内核线程。并且可以根据当前系统CPU的个数创建线程的数量,使得线程处理的事务能够并行化。
Workqueue是内核中实现简单而有效的机制,他显然简化了内核daemon的创建,方便了用户的编程。
Workqueue机制中定义了两个重要的数据结构,分析如下:
1.cpu_workqueue_struct结构。该结构将CPU和内核线程进行了绑定。在创建workqueue的过程中,Linux根据当前系统CPU的个数创建cpu_workqueue_struct。在该结构主要维护了一个任务(work_struct)队列,以及内核线程需要睡眠的等待队列,另外还维护了一个任务上下文,即task_struct。
2.work_struct结构是对任务的抽象。在该结构中需要维护具体的任务方法,需要处理的数据,以及任务处理的时间。该结构定义如下:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>struct work_struct {<br /></li><li> unsigned long pending;<br /></li><li> struct list_head entry; /* 将任务挂载到queue的挂载点 */<br /></li><li> void (*func)(void *); /* 任务方法 */<br /></li><li> void *data; /* 任务处理的数据*/<br /></li><li> void *wq_data; /* work的属主 */<br /></li><li> strut timer_list timer; /* 任务延时处理定时器 */<br /></li><li>};</li></ol>
当用户调用workqueue的初始化接口create_workqueue或者create_singlethread_workqueue对workqueue队列进行初始化时,内核就开始为用户分配一个workqueue对象,并且将其链到一个全局的workqueue队列中。然后Linux根据当前CPU的情况,为workqueue对象分配与CPU个数相同的cpu_workqueue_struct对象,每个cpu_workqueue_struct对象都会存在一条任务队列。紧接着,Linux为每个cpu_workqueue_struct对象分配一个内核thread,即内核daemon去处理每个队列中的任务。至此,用户调用初始化接口将workqueue初始化完毕,返回workqueue的指针。
在初始化workqueue过程中,内核需要初始化内核线程,注册的内核线程工作比较简单,就是不断的扫描对应cpu_workqueue_struct中的任务队列,从中获取一个有效任务,然后执行该任务。所以如果任务队列为空,那么内核daemon就在cpu_workqueue_struct中的等待队列上睡眠,直到有人唤醒daemon去处理任务队列。
Workqueue初始化完毕之后,将任务运行的上下文环境构建起来了,但是具体还没有可执行的任务,所以,需要定义具体的work_struct对象。然后将work_struct加入到任务队列中,Linux会唤醒daemon去处理任务。
上述描述的workqueue内核实现原理可以描述如下:
在Workqueue机制中,提供了一个系统默认的workqueue队列——keventd_wq,这个队列是Linux系统在初始化的时候就创建的。用户可以直接初始化一个work_struct对象,然后在该队列中进行调度,使用更加方便。
我们看到的events/0,events/1这些内核线程就是这个默认工作队列再每个cpu上创建的执行任务(work)的kthread。
Some people may ask, what if we create a work queue ourselves? If created through create_singlethread_workqueue, only one kthread will be generated. If created using create_workqueue, a kthread will be created on each CPU like the default work queue. The name of the kthread is passed in as a parameter.
lWorkqueue programming interface
|
Interface function |
Description | ||||||||||||||||||||||||
2 | create_singlethread_workqueue | is used to create a workqueue and only creates one kernel thread. Input parameters: @name: workqueue name|||||||||||||||||||||||||
3 |
destroy_workqueue | Release the workqueue queue. Input parameters: @workqueue_struct: the workqueue queue pointer that needs to be released | ||||||||||||||||||||||||
4 | schedule_work td> | Schedule the execution of a specific task, and the executed task will be hooked into the workqueue provided by the Linux system— —keventd_wq input parameters: @work_struct: specific task object pointer | ||||||||||||||||||||||||
5 | schedule_delayed_work td> | Delay a certain period of time to perform a specific task. The function is similar to schedule_work, with an additional delay time. Input parameters: @work_struct: specific task object pointer@delay: delay time | ||||||||||||||||||||||||
6 | queue_work | Scheduling and executing tasks in a specified workqueue. Input parameters: @workqueue_struct: specified workqueue pointer@work_struct: specific task object pointer | ||||||||||||||||||||||||
7 | queue_delayed_work | Delayed scheduling executes tasks in a specified workqueue. The function is similar to queue_work, with an additional delay input parameter. |