Overview of processes and threads:
Many students have heard that modern operating systems, such as Mac OS X, UNIX, Linux, Windows, etc., all support "multitasking".
What is "multitasking"? Simply put, the operating system can run multiple tasks at the same time. For example, you are surfing the Internet using a browser, listening to MP3 players, and catching up on homework in Word. This is multitasking. At least three tasks are running at the same time. There are many tasks quietly running in the background at the same time, but they are not displayed on the desktop.
Nowadays, multi-core CPUs have become very popular, but even the single-core CPU in the past can also perform multiple tasks. Since the CPU execution code is executed sequentially, how does a single-core CPU perform multiple tasks?
The answer is that the operating system takes turns to let each task execute alternately. Task 1 executes for 0.01 seconds, switches to task 2, task 2 executes for 0.01 seconds, and then switches to task 3. Execute for 0.01 seconds...and execute this repeatedly. On the surface, each task is executed alternately, but because the execution speed of the CPU is so fast, we feel as if all tasks are executed at the same time.
## True parallelismExecute multitaskingcan onlybe implemented on multi-core CPU, however, due to the large number of tasks Far more than the number of CPU cores, so the operating system will automatically schedule many tasks to each core in turn.
For the operating system, a task is a process. For example, opening a browser starts a browser process, and opening a Notepad starts a Notepad process. , opening two Notepads starts two Notepad processes, and opening one Word starts one Word process.
Some processes can do more than one thing at the same time, such as Word, which can perform typing, spell checking, printing, etc. at the same time. Within a process, if you want todo multiple things at the same time, you need to run multiple "subtasks" at the same time. We call these "subtasks" within the process Thread(Thread).
Since each process has to do at least one thing, a process has at least one thread. Of course, a complex process like Word can have multiple threads, and multiple threads can be executed at the same time. The execution method of multi-threading is the same as that of multi-process, and it is also controlled by the operating system on multiple processes. Quickly switch between threads so that each thread briefly alternates and appears to execute simultaneously. Of course, really executing multiple threads at the same time requires multi-core CPU to be possible. All the Python programs we wrote earlier are processes that perform single tasks, that is, there is only one thread. What if we want to perform multiple tasks at the same time?
There are two solutions:
One is to start multiple processes, although there is only one process for each process Threads, but multiple processes can perform multiple tasks together.
Another method is to start a process and start multiple threads in one process, so that multiple threads can perform multiple tasks together.Of course there is a third method, which is to start multiple processes, and each process starts multiple threads, so that more tasks can be executed at the same time. Of course, this model is more Complex, in practice
is rarely used .
To sum up, there are three ways to implement multitasking:
##Multi-process mode;
Multi-thread mode;
Multi-process + multi-thread mode.
Execute multiple tasks at the same time. Usually, the tasks are not unrelated, but need to communicate and coordinate with each other. Sometimes, task 1 must pause and wait for task 2 to complete. to continue execution. Sometimes, Task 3 and Task 4 cannot be executed at the same time. Therefore, the complexity of multi-process and multi-thread programs is much higher than the single-process and single-thread program we wrote earlier.
Because of the high complexity and difficulty in debugging, we don’t want to write multitasking unless we have to. However, there are many times when it’s impossible to do without multitasking. Think about watching a movie on a computer. One thread must play the video and another thread plays the audio. Otherwise, if implemented in a single thread, the video must be played first and then the audio, or the audio must be played first and then the video. This is obviously not possible.
Python supports both multi-processing and multi-threading. We will discuss how to write both multi-tasking programs.
First acquaintance:
To enable Python programs to implement multi-processing, we first need to understand the relevant knowledge of the operating system. The Unix/Linux operating system provides a fork()
system call, which is very special. Ordinary function calls call once and return once, but fork()
is called once and returns twice, because the operating system automatically changes the current process (called The parent process) makes a copy (called the child process), and then returns in the parent process and the child process respectively. The child process always returns 0
, while the parent process returns the ID of the child process . The reason for this is that a parent process can fork out many child processes, so the parent process must record the ID of each child process, and the child process only needs to call getppid()
You can get the ID of the parent process. Python's os module encapsulates common system calls, including fork
, you can easily create a subprocess in a Python program:
import os print('Process (%s) start...' % os.getpid()) # Only works on Unix/Linux/Mac: pid = os.fork() if pid == 0: print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid())) else: print('I (%s) just created a child process (%s).' % (os.getpid(), pid)) # Process (44587) start... # I (44587) just created a child process (44588). # I am child process (44588) and my parent is 44587.
Since Windows does not have a fork call, the above code cannot be used on Windows run. Since the Mac system is based on the BSD (a type of Unix) kernel, there is no problem running it on a Mac. It is recommended that you use a Mac to learn Python! With the
fork call, a process can copy a child process to handle the new task when it receives a new task. A common Apache server has the parent process listening on the port. Whenever there is a new http request , just fork out the child process to handle new http requests.
multiprocessing module:
If you plan to write a multi-process service program, Unix/Linux is undoubtedly the best choice is the right choice. Since Windows does not have thefork call, is it impossible to write multi-process programs in Python on Windows? Since Python is cross-platform, it should naturally provide cross-platform multi-process support. multiprocessing
The module is a cross-platform version of the multi-process module. The multiprocessing
module provides a Process
class to represent a process object. The following example demonstrates starting a child process and waiting for it to end. :
import os import time # 子进程要执行的代码 def run_proc(name): time.sleep(1) print('Run child process %s (%s)...' % (name, os.getpid())) if __name__=='__main__': print('Parent process %s.' % os.getpid()) p = Process(target=run_proc, args=('test',)) # args里面为何要用,隔开? p.start() # 子进程启动,不加这个子进程不执行 p.join() # 等待子进程p的执行完毕后再向下执行,不加此项,主程序执行完毕,子进程依然会继续执行不受影响 print('Child process end.'), # Parent process 8428. # Run child process test (9392)... # Child process end.
<span style="font-size: 13px; color: #800000">Process实例化时执行self._args = tuple(args)操作,如果不用,隔开生成的slef._args就是一个个字母了,传入两个参数以上是就不用加,号了,如下:<br/></span>
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):assert group is None, 'group argument must be None for now'count = next(_process_counter) self._identity = _current_process._identity + (count,) self._config = _current_process._config.copy() self._parent_pid = os.getpid() self._popen = None self._target = target self._args = tuple(args) a =('ers') b = tuple(a)print(b)# ('e', 'r', 's')a1 =('ers','gte') b1 = tuple(a1)print(b1)# ('ers', 'gte')
Pool process pool:
If you want to start a large number of child processes, you can use the process pool to create child processes in batches:
from multiprocessing import Pool,cpu_count import os, time, random def long_time_task(name): print('Run task %s (%s)...' % (name, os.getpid())) start = time.time() time.sleep(random.random() * 3) end = time.time() print('Task %s runs %0.2f seconds.' % (name, (end - start))) def Bar(arg): print('-->exec done:',arg,os.getpid()) if __name__=='__main__': print('Parent process %s.' % os.getpid()) p = Pool(cpu_count()) # 获取当前cpu核数,多核cpu的情况下多进程才能实现真正的并发 for i in range(5): # p.apply_async(func=long_time_task, args=(i,), callback=Bar) #callback回调 执行完func后再执行callback 用主程序执行 p.apply_async(long_time_task, args=(i,)) print('Waiting for all subprocesses done...') p.close() p.join() # !等待进程池执行完毕,不然主进程执行完毕后,进程池直接关闭 print('All subprocesses done.') # Parent process 4492. # Waiting for all subprocesses done... # Run task 0 (3108)... # Run task 1 (7936)... # Run task 2 (11236)... # Run task 3 (8284)... # Task 2 runs 0.86 seconds. # Run task 4 (11236)... # Task 0 runs 1.34 seconds. # Task 1 runs 1.49 seconds. # Task 3 runs 2.62 seconds. # Task 4 runs 1.90 seconds. # All subprocesses done.
重点:另进程池里的进程执行完毕后,进程关闭自动销毁,不再占用内存,同理,非进程池创建的子进程,执行完毕后也是自动销毁,具体测试如下:
from multiprocessing import Pool,cpu_countimport os, time, randomdef long_time_task(name):print('Run task %s (%s)...' % (name, os.getpid())) start = time.time() time.sleep(random.random() * 3) end = time.time()print('Task %s runs %0.2f seconds.' % (name, (end - start)))def count_process():import psutil pids = psutil.pids() process_name = []for pid in pids: p = psutil.Process(pid) process_name.append(p.name()) # 获取进程名# process_name.append(p.num_threads()) # 获取进程的线程数# print process_nameprint len(process_name)if __name__=='__main__':print('Parent process %s.' % os.getpid()) p = Pool(4)for i in range(5): p.apply_async(long_time_task, args=(i,))print('Waiting for all subprocesses done...') count_process() # 进程池开始时进程数(包含系统其他应用进程) p.close() p.join() count_process() # 进程池关闭时进程数print('All subprocesses done.')# Parent process 8860.# Waiting for all subprocesses done...# Run task 0 (2156)...# Run task 1 (1992)...# Run task 2 (10680)...# Run task 3 (11216)...# 109 开始# Task 2 runs 0.93 seconds.# Run task 4 (10680)...# Task 1 runs 1.71 seconds.# Task 3 runs 2.01 seconds.# Task 0 runs 2.31 seconds.# Task 4 runs 2.79 seconds.# 105 结束# All subprocesses done.
代码解读:
对Pool
对象调用join()
方法会等待所有子进程执行完毕,调用join()
之前必须先调用close()
,调用close()
之后就不能继续添加新的Process
了。
请注意输出的结果,task 0
,1
,2
,3
是立刻执行的,而task 4
要等待前面某个task完成后才执行,这是因为Pool
的默认大小在我的电脑上是4,因此,最多同时执行4个进程。这是Pool
有意设计的限制,并不是操作系统的限制。如果改成:
p = Pool(5)
就可以同时跑5个进程。
由于Pool
的默认大小是CPU的核数,如果你不幸拥有8核CPU,你要提交至少9个子进程才能看到上面的等待效果。
进程间通信:
Process
之间肯定是需要通信的,操作系统提供了很多机制来实现进程间的通信。Python的multiprocessing
模块包装了底层的机制,提供了Queue
、Pipes
等多种方式来交换数据。
我们以Queue
为例,在父进程中创建两个子进程,一个往Queue
里写数据,一个从Queue
里读数据:
from multiprocessing import Process, Queue import os, time, random # 写数据进程执行的代码: def write(q): print('Process to write: %s' % os.getpid()) for value in ['A', 'B', 'C']: print('Put %s to queue...' % value) q.put(value) time.sleep(random.random()) # 读数据进程执行的代码: def read(q): print('Process to read: %s' % os.getpid()) while True: value = q.get(True) print('Get %s from queue.' % value) if __name__=='__main__': # 父进程创建Queue,并传给各个子进程: q = Queue() pw = Process(target=write, args=(q,)) pr = Process(target=read, args=(q,)) # 启动子进程pw,写入: pw.start() # 启动子进程pr,读取: pr.start() # 等待pw结束: pw.join() # pr进程里是死循环,无法等待其结束,只能强行终止: pr.terminate() # 强制关闭子进程 # Process to write: 9472 # Put A to queue... # Process to read: 3948 # Get A from queue. # Put B to queue... # Get B from queue. # Put C to queue... # Get C from queue.
在Unix/Linux下,multiprocessing
模块封装了fork()
调用,使我们不需要关注fork()
的细节。由于Windows没有fork
调用,因此,multiprocessing
需要“模拟”出fork
的效果,父进程所有Python对象都必须通过pickle序列化再传到子进程去,所有,如果multiprocessing
在Windows下调用失败了,要先考虑是不是pickle失败了。
进程间共享数据:
有时候我们不仅仅需要进程间数据传输,也需要多进程间进行数据共享,即可以使用同一全局变量;如:为何下面程序的列表输出为空?
from multiprocessing import Process, Manager import os # manager = Manager() vip_list = [] #vip_list = manager.list() def testFunc(cc): vip_list.append(cc) print 'process id:', os.getpid() if __name__ == '__main__': threads = [] for ll in range(10): t = Process(target=testFunc, args=(ll,)) t.daemon = True threads.append(t) for i in range(len(threads)): threads[i].start() for j in range(len(threads)): threads[j].join() print "------------------------" print 'process id:', os.getpid() print vip_list # process id: 9436 # process id: 11120 # process id: 10636 # process id: 1380 # process id: 10976 # process id: 10708 # process id: 2524 # process id: 9392 # process id: 10060 # process id: 8516 # ------------------------ # process id: 9836 # []
如果你了解 python 的多线程模型,GIL 问题,然后了解多线程、多进程原理,上述问题不难回答,不过如果你不知道也没关系,跑一下上面的代码你就知道是什么问题了。因为进程间内存是独立的
正如上面提到的,在进行并发编程时,最好尽可能避免使用共享状态。在使用多个进程时尤其如此。但是,如果您确实需要使用一些共享数据,那么多处理提供了两种方法。
① 共享内存:
数据可以使用值或数组存储在共享内存映射中。例如,下面的代码:
from multiprocessing import Process, Value, Array def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == '__main__': num = Value('d', 0.0) arr = Array('i', range(10)) p = Process(target=f, args=(num, arr)) p.start() p.join() print num.value print arr[:] # 3.1415927 # [0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
在创建num和arr时使用的“i”和“i”参数是数组模块使用的类型的类型:“表示双精度浮点数”,“i”表示一个已签名的整数。这些共享对象将是进程和线程安全的。为了更灵活地使用共享内存,您可以使用多处理。sharedctypes模块支持创建从共享内存中分配的任意类型的ctypes对象。
② 服务进程:
manager()返回的manager对象控制一个保存Python对象的服务器进程,并允许其他进程使用代理来操作它们。manager()返回的管理器将支持类型列<span class="pre">list</span>
, <span class="pre">dict</span>
, <span class="pre">Namespace</span>
, <span class="pre">Lock</span>
, <span class="pre">RLock</span>
, <span class="pre">Semaphore</span>
, <span class="pre">BoundedSemaphore</span>
, <span class="pre">Condition</span>
, <span class="pre">Event</span>
, <span class="pre">Queue</span>
, <span class="pre">Value</span>
and <span class="pre">Array</span>
。如下:
from multiprocessing import Process, Manager def f(d, l): d[1] = '1' d['2'] = 2 d[0.25] = None l.reverse() if __name__ == '__main__': manager = Manager() d = manager.dict() l = manager.list(range(10)) p = Process(target=f, args=(d, l)) p.start() p.join() print d print l # {0.25: None, 1: '1', '2': 2} # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
服务器进程管理器比使用共享内存对象更灵活,因为它们可以用来支持任意对象类型。另外,单个管理器可以通过网络上不同计算机上的进程共享。但是,它们比使用共享内存要慢。
更多-》》点击
小结
在Unix/Linux下,可以使用fork()
调用实现多进程。
要实现跨平台的多进程,可以使用multiprocessing
模块。
进程间通信是通过Queue(多进程间)
、Pipes(两个进程间)
等实现的。
补充小知识点-》父进程开辟子进程,子进程开辟子子进程,如果把子进程杀掉,子子进程会被杀死吗?
import timefrom multiprocessing import Processimport osdef count_process():import psutil pids = psutil.pids()print len(pids)def test3(): count_process()for i in range(10):print "test3 %s"%os.getpid() time.sleep(0.5)def test1():print "test1 %s"%os.getpid() p2 = Process(target=test3, name="protest2") p2.start() p2.join()if __name__ == '__main__': count_process() p1 = Process(target=test1, name="protest1") p1.start() time.sleep(2) p1.terminate() time.sleep(2) count_process()for i in range(10):print(i) time.sleep(1)# 86# test1 9500# 88# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# 87# 0# test3 3964# test3 3964# 1# 2# 3# 4# 5# 6# 7# 8# 9
The above is the detailed content of Overview of processes and threads in Python development. For more information, please follow other related articles on the PHP Chinese website!