Python의 스레딩 및 다중 처리 이해: 종합 안내서-파이썬 튜토리얼-php.cn

Python의 스레딩 및 다중 처리 이해: 종합 안내서

PHPz

풀어 주다： 2024-09-12 14:17:02

원래의

425명이 탐색했습니다.

Understanding Threading and Multiprocessing in Python: A Comprehensive Guide

소개

Python에서 스레딩 및 멀티프로세싱 개념은 성능을 위해 애플리케이션을 최적화할 때, 특히 동시 또는 병렬 실행이 관련된 경우 자주 논의됩니다. 용어가 중복됨에도 불구하고 이 두 가지 접근 방식은 근본적으로 다릅니다.

이 블로그는 스레딩과 멀티프로세싱에 대한 혼동을 명확히 하고, 각각을 언제 사용해야 하는지 설명하고, 각 개념에 대한 관련 예를 제공합니다.

스레딩과 멀티프로세싱: 주요 차이점

예제와 사용 사례를 살펴보기 전에 주요 차이점을 간략히 살펴보겠습니다.

스레딩: 단일 프로세스 내에서 여러 스레드(프로세스의 작은 단위)를 실행하는 것을 의미합니다. 스레드는 동일한 메모리 공간을 공유하므로 스레드가 가벼워집니다. 그러나 Python의 Global Interpreter Lock(GIL)은 CPU 바인딩 작업에 대한 스레딩의 실제 병렬 처리를 제한합니다.
멀티프로세싱: 각각 자체 메모리 공간이 있는 여러 프로세스를 실행하는 작업이 포함됩니다. 프로세스는 스레드보다 무겁지만 메모리를 공유하지 않기 때문에 진정한 병렬성을 달성할 수 있습니다. 이 접근 방식은 전체 코어 활용이 필요한 CPU 바인딩 작업에 이상적입니다.

스레딩이란 무엇인가요?

스레딩은 동일한 프로세스 내에서 여러 작업을 동시에 실행하는 방법입니다. 이러한 작업은 동일한 메모리 공간을 공유하는 별도의 경량 실행 단위인 스레드에 의해 처리됩니다. 스레딩은 기본 프로그램이 외부 리소스를 기다리는 데 많은 시간을 소비하는 파일 읽기, 네트워크 요청 또는 데이터베이스 쿼리와 같은 I/O 중심 작업에 유용합니다.

스레딩을 사용하는 경우

프로그램이 I/O 바인딩된 경우(예: 파일 읽기/쓰기, 네트워크 요청 만들기)
작업이 입력 또는 출력 작업을 기다리는 데 많은 시간을 소비하는 경우
단일 프로세스 내에서 가벼운 동시성이 필요한 경우

예: 기본 스레딩

import threading
import time

def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)

def print_letters():
    for letter in ['a', 'b', 'c', 'd', 'e']:
        print(letter)
        time.sleep(1)

# Create two threads
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letters)

# Start both threads
t1.start()
t2.start()

# Wait for both threads to complete
t1.join()
t2.join()

print("Both threads finished execution.")

로그인 후 복사

위의 예에서는 두 개의 스레드가 동시에 실행됩니다. 하나는 숫자를 인쇄하고 다른 하나는 문자를 인쇄합니다. sleep() 호출은 I/O 작업을 시뮬레이션하고 프로그램은 이러한 대기 중에 스레드 간에 전환할 수 있습니다.

스레딩 문제: GIL(Global Interpreter Lock)

Python의 GIL은 여러 네이티브 스레드가 Python 바이트코드를 동시에 실행하는 것을 방지하는 메커니즘입니다. 프로세스에서 여러 스레드가 활성화되어 있어도 한 번에 하나의 스레드만 실행됩니다.

이 제한으로 인해 스레드는 GIL로 인해 여러 코어를 완전히 활용할 수 없기 때문에 실제 병렬 처리가 필요한 CPU 바인딩 작업에 적합하지 않습니다.

멀티프로세싱이란 무엇인가요?

멀티프로세싱을 사용하면 여러 프로세스를 동시에 실행할 수 있으며 각 프로세스에는 자체 메모리 공간이 있습니다. 프로세스는 메모리를 공유하지 않으므로 GIL 제한이 없으므로 여러 CPU 코어에서 진정한 병렬 실행이 가능합니다. 멀티프로세싱은 CPU 사용량을 최대화해야 하는 CPU 바인딩 작업에 이상적입니다.

멀티프로세싱을 사용해야 하는 경우

프로그램이 CPU에 바인딩된 경우(예: 과도한 계산 수행, 데이터 처리)
메모리 공유 없이 진정한 병렬화가 필요한 경우
독립적인 작업의 여러 인스턴스를 동시에 실행하려는 경우

예: 기본 다중 처리

import multiprocessing
import time

def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)

def print_letters():
    for letter in ['a', 'b', 'c', 'd', 'e']:
        print(letter)
        time.sleep(1)

if __name__ == "__main__":
    # Create two processes
    p1 = multiprocessing.Process(target=print_numbers)
    p2 = multiprocessing.Process(target=print_letters)

    # Start both processes
    p1.start()
    p2.start()

    # Wait for both processes to complete
    p1.join()
    p2.join()

    print("Both processes finished execution.")

로그인 후 복사

이 예에서는 두 개의 개별 프로세스가 동시에 실행됩니다. 스레드와 달리 각 프로세스는 자체 메모리 공간을 가지며 GIL의 간섭 없이 독립적으로 실행됩니다.

다중 처리의 메모리 격리

스레딩과 다중 처리의 주요 차이점 중 하나는 프로세스가 메모리를 공유하지 않는다는 것입니다. 이는 프로세스 간에 간섭이 없음을 보장하지만 프로세스 간에 데이터를 공유하려면 멀티프로세싱 모듈에서 제공하는 큐, 파이프 또는 관리자 개체와 같은 특별한 메커니즘이 필요하다는 의미이기도 합니다.

스레딩과 멀티프로세싱: 올바른 도구 선택

이제 두 가지 접근 방식이 어떻게 작동하는지 이해했으므로 작업 유형에 따라 스레딩 또는 멀티프로세싱을 선택해야 하는 경우를 분석해 보겠습니다.

Use Case	Type	Why?
Network requests, I/O-bound tasks (file read/write, DB calls)	Threading	Multiple threads can handle I/O waits concurrently.
CPU-bound tasks (data processing, calculations)	Multiprocessing	True parallelism is possible by utilizing multiple cores.
Task requires shared memory or lightweight concurrency	Threading	Threads share memory and are cheaper in terms of resources.
Independent tasks needing complete isolation (e.g., separate processes)	Multiprocessing	Processes have isolated memory, making them safer for independent tasks.

Performance Considerations

Threading Performance

Threading excels in scenarios where the program waits on external resources (disk I/O, network). Since threads can work concurrently during these wait times, threading can help boost performance.

However, due to the GIL, CPU-bound tasks do not benefit much from threading because only one thread can execute at a time.

Multiprocessing Performance

Multiprocessing allows true parallelism by running multiple processes across different CPU cores. Each process runs in its own memory space, bypassing the GIL and making it ideal for CPU-bound tasks.

However, creating processes is more resource-intensive than creating threads, and inter-process communication can slow things down if there's a lot of data sharing between processes.

A Practical Example: Threading vs. Multiprocessing for CPU-bound Tasks

Let's compare threading and multiprocessing for a CPU-bound task like calculating the sum of squares for a large list.

Threading Example for CPU-bound Task

import threading

def calculate_squares(numbers):
    result = sum([n * n for n in numbers])
    print(result)

numbers = range(1, 10000000)
t1 = threading.Thread(target=calculate_squares, args=(numbers,))
t2 = threading.Thread(target=calculate_squares, args=(numbers,))

t1.start()
t2.start()

t1.join()
t2.join()

로그인 후 복사

Due to the GIL, this example will not see significant performance improvements over a single-threaded version because the threads can't run simultaneously for CPU-bound operations.

Multiprocessing Example for CPU-bound Task

import multiprocessing

def calculate_squares(numbers):
    result = sum([n * n for n in numbers])
    print(result)

if __name__ == "__main__":
    numbers = range(1, 10000000)
    p1 = multiprocessing.Process(target=calculate_squares, args=(numbers,))
    p2 = multiprocessing.Process(target=calculate_squares, args=(numbers,))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

로그인 후 복사

In the multiprocessing example, you'll notice a performance boost since both processes run in parallel across different CPU cores, fully utilizing the machine's computational resources.

Conclusion

Understanding the difference between threading and multiprocessing is crucial for writing efficient Python programs. Here’s a quick recap:

Use threading for I/O-bound tasks where your program spends a lot of time waiting for resources.
Use multiprocessing for CPU-bound tasks to maximize performance through parallel execution.

Knowing when to use which approach can lead to significant performance improvements and efficient use of resources.

위 내용은 Python의 스레딩 및 다중 처리 이해: 종합 안내서의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!