In today's fast-paced digital environment, it is critical for developers and data scientists to efficiently complete computationally difficult tasks. Fortunately, Python offers powerful parallel processing capabilities due to its adaptability and broad ecosystem. We can achieve substantial performance improvements by breaking down difficult problems into smaller, more manageable activities and working on them simultaneously.
Python’s parallel processing capabilities allow us to leverage available computer resources to perform activities such as web scraping, scientific simulations, and data analysis faster and more efficiently. In this article, we will start a journey through parallel processing in Python. We'll examine a number of methods, including multiprocessing, asynchronous programming, and multithreading, and learn how to use them effectively to bypass performance roadblocks in your system. Join us as we realize the full power of parallel processing in Python and reach new heights of performance and productivity.
Splitting a job into smaller subtasks and running them simultaneously on multiple processors or cores is called parallel processing. Parallel processing can significantly reduce the overall execution time of a program by efficiently utilizing available computing resources. Asynchronous programming, multiprocessing, and multithreading are just a few of the parallel processing methods Python offers.
Using the multi-threading method, many threads run simultaneously in the same process and share the same memory. Multithreading can be easily implemented using Python's threading module. However, using multithreading in Python may not have a speedup effect on CPU-intensive operations because the Global Interpreter Lock (GIL) only allows one thread to execute Python bytecode at the same time. However, multithreading can be useful for I/O-intensive tasks because it allows threads to run other operations while waiting for I/O operations to complete.
Let’s look at an example of using multi-threading to download multiple web pages:
import threading import requests def download_page(url): response = requests.get(url) print(f"Downloaded {url}") urls = [ "https://example.com", "https://google.com", "https://openai.com" ] threads = [] for url in urls: thread = threading.Thread(target=download_page, args=(url,)) thread.start() threads.append(thread) for thread in threads: thread.join()
Downloaded https://example.com Downloaded https://google.com Downloaded https://openai.com
Since the above code snippet can do multiple downloads at the same time, this code snippet downloads each URL in its own thread. The join() function ensures that the main thread waits for each thread to complete before continuing.
Multi-process corresponds to multi-threading. By using multiple processes, each process has its own memory space, providing true parallelism. Python's multiprocessing module provides a high-level interface to implement multiple processes. Multiprocessing is suitable for CPU-intensive tasks because each process runs in an independent Python interpreter, avoiding GIL multithreading limitations.
Multiple processes are used in the code below. Once the pool class has spawned a set of worker processes, the map() method distributes the burden to the available processes. A result list is a collection of results.
Consider the following example, in which we use multiple processes to calculate the square of each integer in a list:
import multiprocessing def square(number): return number ** 2 numbers = [1, 2, 3, 4, 5] with multiprocessing.Pool() as pool: results = pool.map(square, numbers) print(results)
[1, 4, 9, 16, 25]
By taking advantage of non-blocking operations, asynchronous programming enables efficient execution of I/O-intensive processes. Thanks to the asyncio package, Python can create asynchronous code using coroutines, event loops, and futures. As online applications and APIs become more popular, asynchronous programming becomes increasingly important.
The fetch_page() coroutine in the code example below uses aiohttp to asynchronously obtain web pages. The main() method generates a list of jobs and then uses asyncio.gather() to execute these jobs simultaneously. To wait for a task to complete and receive the results, use the await keyword.
Let's look at an example of using asyncio and aiohttp to asynchronously obtain multiple web pages:
import asyncio import aiohttp async def fetch_page(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text() async def main(): urls = [ "https://example.com", "https://google.com", "https://openai.com" ] tasks = [fetch_page(url) for url in urls] pages = await asyncio.gather(*tasks) print(pages) asyncio.run(main())
['<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n\n <meta charset="utf-8" />\n <meta http-equiv="Content-type"content="text/html; charset=utf-8" />\n <meta name="viewport" content="width=device-width, initialscale=1" />\n <style type="text/css">\n body {\n background-color: #f0f0f2;\n margin: 0;\n padding: 0;\n font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n \n }\n div {\n width: 600px;\n margin: 5em auto;\n padding: 50px;\n background-color: #fff;\n border-radius: 1em;\n }\n a:link, a:visited {\n color: #38488f;\n text-decoration: none;\n }\n @media (maxwidth: 700px) {\n body {\n background-color: #fff;\n }\n div {\n width: auto;\n margin: 0 auto;\n border-radius: 0;\n padding: 1em;\n }\n }\n </style> \n</head>\n\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.</p>\n <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>', '<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/logos/doodles/2021/mom- and-dad-6116550989716480.2-law.gif" itemprop="image"><link href="/logos/doodles/2021/mom-and-dad-6116550989716480.2-law.gif" rel="icon" type="image/gif"><title>Google</title><script nonce="sJwM0Ptp5a/whzxPtTD8Yw==">(function(){window.google={kEI:'cmKgYY37A7 K09QPhzKuACw',kEXPI:'1354557,1354612,1354620,1354954,1355090,1355493,13556 83,3700267,4029815,4031109,4032677,4036527,4038022,4043492,4045841,4048347,4 048490,4052469,4055589,4056520,4057177,4057696,4060329,4060798,4061854,4062 531,4064696,406 '
Python's parallel processing techniques vary depending on the specific circumstances of the task. Here are some guidelines to help you make an informed decision:
For I/O-intensive activities, where most of the execution time is spent waiting for input/output operations, multithreading is appropriate. It is suitable for tasks such as downloading files, using APIs, and manipulating files. Due to Python's Global Interpreter Lock (GIL), multithreading may not significantly speed up CPU-intensive activities.
On the other hand, multi-processing is suitable for CPU-bound tasks involving intensive calculations. It achieves true parallelism by utilizing multiple processes, each with its own memory space, bypassing the limitations of the GIL. However, it incurs additional overhead in terms of memory consumption and inter-process communication.
Asynchronous programming performed using libraries such as asyncio is useful for I/O-intensive activities involving network operations. It utilizes non-blocking I/O operations so that jobs can continue without waiting for each operation to complete. This approach efficiently manages multiple concurrent connections, making it suitable for web server development, Web API interactions, and web scraping. Asynchronous programming minimizes wait times for I/O operations, ensuring responsiveness and scalability.
Python's parallel processing capabilities provide opportunities to increase efficiency in tasks that require complex calculations. Whether you choose to use multithreading, multiprocessing, or asynchronous programming, Python provides the necessary tools and modules to take advantage of concurrency effectively. By understanding the nature of the activity and choosing the appropriate technology, you can maximize the benefits of parallel processing and reduce execution time. So, keep exploring and taking full advantage of Python’s parallelism to create faster, more efficient applications.
The above is the detailed content of Parallel processing in Python. For more information, please follow other related articles on the PHP Chinese website!