스레딩 및 멀티프로세싱을 사용한 Python의 동시성-파이썬 튜토리얼-php.cn

Concurrency in Python with Threading and Multiprocessing

동시성은 여러 작업을 동시에 실행하여 애플리케이션 성능을 향상시키는 현대 프로그래밍에서 중요한 아이디어입니다.

Python에서 동시성을 달성하는 방법에는 여러 가지가 있으며, 스레딩과 멀티프로세싱이 가장 잘 알려져 있습니다.

이 기사에서는 이 두 가지 방법을 자세히 살펴보고, 작동 방식을 이해하고, 실제 코드 예제와 함께 각 방법을 언제 사용해야 하는지 논의하겠습니다.

동시성이란 무엇입니까?

스레딩과 다중 처리에 대해 이야기하기 전에 동시성의 의미를 이해하는 것이 중요합니다.

동시성은 프로그램이 동시에 여러 작업이나 프로세스를 수행할 수 있는 경우를 의미합니다.

이렇게 하면 특히 파일 읽기나 많은 계산 수행과 같은 작업을 수행해야 할 때 프로그램이 리소스를 더 효율적으로 사용하고 더 빠르게 실행할 수 있습니다.

동시성을 달성하는 두 가지 주요 방법은 다음과 같습니다.

병렬성: 컴퓨터 프로세서의 서로 다른 부분에서 동시에 여러 작업을 실행합니다.
동시성: 동일한 기간 동안 여러 작업을 처리하지만 반드시 동시에 처리할 필요는 없습니다.

Python은 동시성을 달성하는 두 가지 주요 방법을 제공합니다.

스레딩: 동시에 관리할 수 있는 작업에 사용됩니다.
멀티프로세싱: 서로 다른 프로세서 코어에서 동시에 실행해야 하는 작업에 적합합니다.

Python의 스레딩

스레딩을 사용하면 동일한 메모리 공간을 공유하면서 동일한 프로세스 내에서 스레드라고 하는 여러 개의 작은 프로세스 단위를 실행할 수 있습니다.

스레드는 프로세스보다 가볍고 스레드 간 전환이 더 빠릅니다.

그러나 Python의 스레딩에는 GIL(Global Interpreter Lock)이 적용되어 한 번에 하나의 스레드만 Python 코드를 실행할 수 있습니다.

스레딩 작동 방식

Python의 스레딩 모듈은 스레드를 생성하고 관리하는 간단하고 유연한 방법을 제공합니다.

기본적인 예부터 시작해 보겠습니다.

import threading
import time


def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)


# Creating a thread
thread = threading.Thread(target=print_numbers)

# Starting the thread
thread.start()

# Wait for the thread to complete
thread.join()

print("Thread has finished executing")


# Output:
# Number: 0
# Number: 1
# Number: 2
# Number: 3
# Number: 4
# Thread has finished executing

로그인 후 복사

이 예에서는:

인쇄 사이에 1초의 지연을 두고 0부터 4까지의 숫자를 인쇄하는 print_numbers() 함수를 정의합니다.
threading.Thread()를 사용하여 스레드를 생성하고 print_numbers()를 대상 함수로 전달합니다.
start() 메소드는 스레드 실행을 시작하고, Join()은 메인 프로그램이 계속 진행하기 전에 스레드가 완료될 때까지 기다리도록 합니다.

예: I/O 바인딩된 작업을 위한 스레딩

스레딩은 프로그램이 대부분의 시간을 외부 리소스를 기다리는 데 소비하는 파일 작업, 네트워크 요청 또는 데이터베이스 쿼리와 같은 I/O 바인딩 작업에 특히 유용합니다.

다음은 스레드를 사용하여 파일 다운로드를 시뮬레이션하는 예입니다.

import threading
import time


def download_file(file_name):
    print(f"Starting download of {file_name}...")
    time.sleep(2)  # Simulate download time
    print(f"Finished downloading {file_name}")


files = ["file1.zip", "file2.zip", "file3.zip"]

threads = []

# Create and start threads
for file in files:
    thread = threading.Thread(target=download_file, args=(file,))
    thread.start()
    threads.append(thread)

# Ensure all threads have finished
for thread in threads:
    thread.join()

print("All files have been downloaded.")

# Output:
# Starting download of file1.zip...
# Starting download of file2.zip...
# Starting download of file3.zip...
# Finished downloading file1.zip
# Finished downloading file2.zip
# Finished downloading file3.zip
# All files have been downloaded.

로그인 후 복사

각 파일 다운로드에 대해 별도의 스레드를 생성하고 관리함으로써 프로그램은 여러 작업을 동시에 처리할 수 있어 전반적인 효율성이 향상됩니다.

코드의 주요 단계는 다음과 같습니다.

다운로드 프로세스를 시뮬레이션하기 위해 download_file 함수가 정의됩니다.
다운로드해야 하는 파일을 나타내기 위해 파일 이름 목록이 생성됩니다.
목록의 각 파일에 대해 download_file을 대상 함수로 사용하여 새 스레드가 생성됩니다. 각 스레드는 생성 직후 시작되어 스레드 목록에 추가됩니다.
메인 프로그램은 모든 스레드가 Join() 메소드를 사용하여 완료될 때까지 기다리므로 모든 다운로드가 완료될 때까지 프로그램이 진행되지 않습니다.

스레딩의 한계

스레딩은 I/O 중심 작업의 성능을 향상시킬 수 있지만 다음과 같은 제한 사항이 있습니다.

GIL(전역 해석기 잠금): GIL은 CPU 바인딩 작업에 대해 한 번에 하나의 스레드로 실행을 제한하여 멀티 코어 프로세서에서 스레딩의 효율성을 제한합니다.
경합 조건: 스레드는 동일한 메모리 공간을 공유하므로 부적절한 동기화로 인해 프로그램 결과가 스레드 타이밍에 따라 달라지는 경쟁 조건이 발생할 수 있습니다.
교착 상태: 리소스를 해제하기 위해 서로를 기다리는 스레드는 진행이 이루어지지 않는 교착 상태로 이어질 수 있습니다.

Python의 다중 처리

멀티프로세싱은 스레드 대신 별도의 프로세스를 사용하여 스레딩의 한계를 해결합니다.

각 프로세스에는 자체 메모리 공간과 Python 인터프리터가 있어 멀티 코어 시스템에서 진정한 병렬 처리가 가능합니다.

이렇게 하면 다중 처리가 많은 계산이 필요한 작업에 이상적입니다.

다중 처리 작동 방식

Python의 다중 처리 모듈을 사용하면 프로세스를 쉽게 생성하고 관리할 수 있습니다.

Let’s start with a basic example:

import multiprocessing
import time


def print_numbers():
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(1)


if __name__ == "__main__":
    # Creating a process
    process = multiprocessing.Process(target=print_numbers)

    # Starting the process
    process.start()

    # Wait for the process to complete
    process.join()

    print("Process has finished executing")

# Output:
# Number: 0
# Number: 1
# Number: 2
# Number: 3
# Number: 4
# Process has finished executing

로그인 후 복사

This example is similar to the threading example, but with processes.

Notice that the process creation and management are similar to threading, but because processes run in separate memory spaces, they are truly concurrent and can run on different CPU cores.

Example: Multiprocessing for CPU-Bound Tasks

Multiprocessing is particularly beneficial for tasks that are CPU-bound, such as numerical computations or data processing.

Here’s an example that calculates the square of numbers using multiple processes:

import multiprocessing


def compute_square(number):
    return number * number


if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    # Create a pool of processes
    with multiprocessing.Pool() as pool:
        # Map function to numbers using multiple processes
        results = pool.map(compute_square, numbers)

    print("Squares:", results)

# Output:
# Squares: [1, 4, 9, 16, 25]

로그인 후 복사

Here are the key steps in the code:

A function compute_square is defined to take a number as input and return its square.
The code within the if name == "main": block ensures that it runs only when the script is executed directly.
A list of numbers is defined, which will be squared.
A pool of worker processes is created using multiprocessing.Pool().
The map method is used to apply the compute_square function to each number in the list, distributing the workload across multiple processes.

Inter-Process Communication (IPC)

Since each process has its own memory space, sharing data between processes requires inter-process communication (IPC) mechanisms.

The multiprocessing module provides several tools for IPC, such as Queue, Pipe, and Value.

Here’s an example using Queue to share data between processes:

import multiprocessing


def worker(queue):
    # Retrieve and process data from the queue
    while not queue.empty():
        item = queue.get()
        print(f"Processing {item}")


if __name__ == "__main__":
    queue = multiprocessing.Queue()

    # Add items to the queue
    for i in range(10):
        queue.put(i)

    # Create a pool of processes to process the queue
    processes = []
    for _ in range(4):
        process = multiprocessing.Process(target=worker, args=(queue,))
        processes.append(process)
        process.start()

    # Wait for all processes to complete
    for process in processes:
        process.join()

    print("All processes have finished.")


# Output:
# Processing 0
# Processing 1
# Processing 2
# Processing 3
# Processing 4
# Processing 5
# Processing 6
# Processing 7
# Processing 8
# Processing 9
# All processes have finished.

로그인 후 복사

In this example:

def worker(queue): Defines a function worker that takes a queue as an argument. The function retrieves and processes items from the queue until it is empty.
if name == "main":: Ensures that the following code runs only if the script is executed directly, not if it is imported as a module.
queue = multiprocessing.Queue(): Creates a queue object for inter-process communication.
for i in range(10): queue.put(i): Adds items (numbers 0 through 9) to the queue.
processes = []: Initializes an empty list to store process objects.
The for loop for _ in range(4): Creates four worker processes.
process = multiprocessing.Process(target=worker, args=(queue,)): Creates a new process with worker as the target function and passes the queue as an argument.
processes.append(process): Adds the process object to the processes list.
process.start(): Starts the process.
The for loop for process in processes: Waits for each process to complete using the join() method.

Challenges of Multiprocessing

While multiprocessing provides true parallelism, it comes with its own set of challenges:

Higher Overhead: Creating and managing processes is more resource-intensive than threads due to separate memory spaces.
Complexity: Communication and synchronization between processes are more complex than threading, requiring IPC mechanisms.
Memory Usage: Each process has its own memory space, leading to higher memory usage compared to threading.

When to Use Threading vs. Multiprocessing

Choosing between threading and multiprocessing depends on the type of task you're dealing with:

Use Threading:

For tasks that involve a lot of waiting, such as network operations or reading/writing files (I/O-bound tasks).
When you need to share memory between tasks and can manage potential issues like race conditions.
For lightweight concurrency without the extra overhead of creating multiple processes.

Use Multiprocessing:

For tasks that require heavy computations or data processing (CPU-bound tasks) and can benefit from running on multiple CPU cores at the same time.
When you need true parallelism and the Global Interpreter Lock (GIL) in threading becomes a limitation.
For tasks that can run independently and don’t require frequent communication or shared memory.

Conclusion

Concurrency in Python is a powerful way to make your applications run faster.

Threading is great for tasks that involve a lot of waiting, like network operations or reading/writing files, but it's not as effective for tasks that require heavy computations because of something called the Global Interpreter Lock (GIL).

On the other hand, multiprocessing allows for true parallelism, making it perfect for CPU-intensive tasks, although it comes with higher overhead and complexity.

데이터를 처리하든, 여러 네트워크 요청을 처리하든, 복잡한 계산을 수행하든 Python의 스레딩 및 다중 처리 도구는 프로그램을 최대한 효율적이고 빠르게 만드는 데 필요한 기능을 제공합니다.

위 내용은 스레딩 및 멀티프로세싱을 사용한 Python의 동시성의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!