What is the producer consumer model
At work, you may encounter a situation where a certain module is responsible for generating data, and the data is processed by another module (the module here is broad and can be a class, function, thread, process, etc.). The module that generates data is called a producer; the module that processes data is called a consumer. A buffer zone is added between the producer and the consumer, which we call a warehouse figuratively. The producer is responsible for entering goods into the warehouse, and the consumer is responsible for taking goods from the warehouse. This constitutes the producer-consumer model. . The structure diagram is as follows:
Advantages of the producer-consumer model:
1. Decoupling
Assume that producers and consumers are two classes respectively. If the producer directly calls a method of the consumer, the producer will be dependent on the consumer (that is, coupled). If the consumer's code changes in the future, it may affect the producer. And if both depend on a certain buffer, there is no direct dependence between the two, and the coupling is reduced accordingly.
For example, if we go to the post office to deliver a letter, if you don’t use a mailbox (that is, a buffer zone), you must hand the letter directly to the postman. Some students may say, isn’t it simple to just give it to the postman? It’s actually not that simple. You have to know who the postman is before you can give him the letter (it would be a disaster if someone faked it based on the uniform he is wearing). This creates a dependency between you and Postman (equivalent to strong coupling between producers and consumers). In case the postman changes one day, you have to re-understand it (equivalent to a change in the consumer leading to a modification of the producer code). The mailbox is relatively fixed, and the cost of relying on it is relatively low (equivalent to weak coupling with the buffer).
2. Support concurrency
Since the producer and the consumer are two independent concurrent bodies, they are connected using the buffer as a bridge. The producer only needs to throw data into the buffer and can continue to produce the next data, while the consumer Just get the data from the buffer, so that there will be no blocking due to each other's processing speed.
Continuing the above example, if we don’t use the mailbox, we have to wait for the postman at the post office until he comes back and we hand the letter to him. During this period, we can’t do anything (that is, producer blocking). Or the postman has to go door to door and ask who wants to send a letter (equivalent to consumer polling).
3. Support uneven busy schedule
Buffers have another benefit. If the speed of producing data is fast and slow, the benefits of the buffer will be reflected. When data is produced quickly, consumers have no time to process it, and unprocessed data can be temporarily stored in the buffer. When the production speed of producers slows down, consumers will slowly dispose of it.
For full reuse, let’s take the example of sending a letter again. Suppose the postman can only carry 1,000 letters at a time. If you happen to send greeting cards on Valentine's Day (or maybe Christmas), and you need to send more than 1,000 letters, then the buffer zone of the mailbox will come in handy. The postman temporarily stores the letters that are too late to take away in the mailbox until he comes next time to take them away.
Python example:
Use queues to implement a simple producer-consumer model. The producer puts the time in the queue and the consumer prints the time it takes out
class Consumer(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) self._queue = queue def run(self): while True: msg = self._queue.get() if isinstance(msg, str) and msg == 'quit': break print "I'm a thread, and I received %s!!" % msg print 'Bye byes!' def producer(): queue = Queue.Queue() worker = Consumer(queue) worker.start() # 开启消费者线程 start_time = time.time() while time.time() - start_time < 5: queue.put('something at %s' % time.time()) time.sleep(1) queue.put('quit') worker.join() if __name__ == '__main__': producer()
Using multi-threading, when doing crawling, the producer is used to generate URL links, and the consumer is used to obtain URL data. With the help of queues, multi-threading can be used to speed up the crawling.
import time import threading import Queue import urllib2 class Consumer(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) self._queue = queue def run(self): while True: content = self._queue.get() print content if isinstance(content, str) and content == 'quit': break response = urllib2.urlopen(content) print 'Bye byes!' def Producer(): urls = [ 'http://211.103.242.133:8080/Disease/Details.aspx?id=2258', 'http://211.103.242.133:8080/Disease/Details.aspx?id=2258', 'http://211.103.242.133:8080/Disease/Details.aspx?id=2258', 'http://211.103.242.133:8080/Disease/Details.aspx?id=2258' ] queue = Queue.Queue() worker_threads = build_worker_pool(queue, 4) start_time = time.time() for url in urls: queue.put(url) for worker in worker_threads: queue.put('quit') for worker in worker_threads: worker.join() print 'Done! Time taken: {}'.format(time.time() - start_time) def build_worker_pool(queue, size): workers = [] for _ in range(size): worker = Consumer(queue) worker.start() workers.append(worker) return workers if __name__ == '__main__': Producer()