How to do parallel programming in Python?

Learn how to do parallel programming in python? with practical examples, diagrams, and best practices. Covers python, parallel-processing development techniques with visual explanations.

Unlocking Concurrency: A Guide to Parallel Programming in Python

Hero image for How to do parallel programming in Python?

Explore the fundamentals of parallel programming in Python, understanding the Global Interpreter Lock (GIL) and leveraging modules like multiprocessing and threading for efficient concurrent execution.

Python, often celebrated for its simplicity and readability, presents unique challenges and opportunities when it comes to parallel programming. While the Global Interpreter Lock (GIL) can limit true parallel execution of CPU-bound tasks within a single process, Python offers robust modules like multiprocessing and threading to achieve concurrency and parallelism. This article will guide you through the core concepts, practical implementations, and best practices for writing efficient parallel Python code.

Understanding Concurrency vs. Parallelism and the GIL

Before diving into code, it's crucial to distinguish between concurrency and parallelism. Concurrency is about dealing with many things at once (e.g., multitasking on a single core), while parallelism is about doing many things at once (e.g., using multiple cores simultaneously). Python's GIL is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes at once. This means that even on multi-core systems, a single Python process can only execute one thread at a time for CPU-bound tasks. However, for I/O-bound tasks, where threads spend most of their time waiting for external resources, the GIL is released, allowing other threads to run.

flowchart TD
    A[Python Program Start]
    B{Task Type?}
    C[CPU-Bound Task]
    D[I/O-Bound Task]
    E[GIL Acquired]
    F[GIL Released]
    G[Single Thread Execution]
    H[Multiple Threads (Concurrent)]
    I[Multiprocessing (Parallel)]
    J[Program End]

    A --> B
    B -->|CPU-Bound| C
    B -->|I/O-Bound| D
    C --> E
    E --> G
    D --> F
    F --> H
    G --> J
    H --> J
    C --> I
    I --> J

Decision flow for Python concurrency and parallelism based on task type.

Achieving Parallelism with multiprocessing

For CPU-bound tasks, the multiprocessing module is your go-to solution. It bypasses the GIL by spawning new processes, each with its own Python interpreter and memory space. This allows true parallel execution across multiple CPU cores. The module provides a Process class for creating individual processes and a Pool class for managing a pool of worker processes, which is ideal for applying a function to a large dataset in parallel.

import multiprocessing
import os

def square(number):
    print(f"Process ID: {os.getpid()} - Squaring {number}")
    return number * number

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    # Create a Pool of worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Map the square function to the numbers list
        results = pool.map(square, numbers)
    
    print(f"\nOriginal numbers: {numbers}")
    print(f"Squared results: {results}")

Example of using multiprocessing.Pool for parallel execution of a CPU-bound task.

Achieving Concurrency with threading

The threading module allows you to run multiple functions concurrently within the same process. Due to the GIL, this is most effective for I/O-bound tasks, such as network requests, file operations, or database queries. While one thread is waiting for an I/O operation to complete, the GIL is released, allowing another thread to execute Python bytecode. This can significantly improve the responsiveness and throughput of applications that spend a lot of time waiting.

import threading
import time

def fetch_url(url):
    print(f"Starting to fetch {url}...")
    time.sleep(2) # Simulate network request
    print(f"Finished fetching {url}")

urls = [
    "http://example.com/page1",
    "http://example.com/page2",
    "http://example.com/page3"
]

threads = []
for url in urls:
    thread = threading.Thread(target=fetch_url, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("All URLs fetched.")

Example of using threading for concurrent execution of I/O-bound tasks.