Python mul­ti­pro­cessing lets you divide the workload among multiple processes, cutting down on overall execution time. This is es­pe­cially useful for making hefty cal­cu­la­tions or handling large datasets.

What is Python mul­ti­pro­cessing?

Mul­ti­pro­cessing in Python refers to running multiple processes sim­ul­tan­eously, allowing you to make the most of multicore systems. Unlike single-threaded methods that handle tasks one by one, mul­ti­pro­cessing lets various parts of the program run in parallel, each on its own. Each process gets its own memory space and can run on separate processor cores, slashing execution time for heavy-duty or time-sensitive op­er­a­tions.

Python mul­ti­pro­cessing has a wide range of ap­plic­a­tions. Mul­ti­pro­cessing is often used in data pro­cessing and analysis to process large data sets faster and to ac­cel­er­ate complex analyses. Mul­ti­pro­cessing can also be used in sim­u­la­tions and modelling cal­cu­la­tions (e.g., in sci­entif­ic ap­plic­a­tions) to shorten the execution times of complex cal­cu­la­tions. In addition to powering web scraping by fetching data from multiple sites sim­ul­tan­eously, it also boosts ef­fi­ciency in image pro­cessing and computer vision, resulting in quicker image analysis.

HiDrive Cloud Storage
Store and share your data on the go
  • Store, share and edit data easily
  • ISO-certified European data centres
  • Highly secure and GDPR compliant

How to implement Python mul­ti­pro­cessing

Python offers various options for im­ple­ment­ing mul­ti­pro­cessing. In the following sections, we’ll introduce you to three common tools: the multiprocessing module, the concurrent.futures library and the joblib package.

multiprocessing module

The multiprocessing module is the standard module for Python mul­ti­pro­cessing. With this module, you can create processes, share data between them and sync them using locks, queues and other tools.

import multiprocessing
def task(n):
    result = n * n
    print(f"Result: {result}")
if __name__ == "__main__":
    processes = []
    for i in range(1, 6):
        process = multiprocessing.Process(target=task, args=(i,))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
python

In the example above, we use the multiprocessing.Process class to spawn and run processes executing the task() function, which computes the square of a given number. After ini­tial­ising the processes, we wait for them to complete before pro­ceed­ing with the main program. The result is displayed using an f-string, a Python string format method that in­cor­por­ates ex­pres­sions. It’s worth noting that the sequence of output is random and non-de­term­in­ist­ic. You can also create a process pool with Python multiprocessing:

import multiprocessing
def task(n):
    return n * n
if __name__ == "__main__":
    with multiprocessing.Pool() as pool:
        results = pool.map(task, range(1, 6))
        print(results)  # Output: [1, 4, 9, 16, 25]
python

With pool.map() the function task() is applied to a sequence of data, and the results are collected and output.

concurrent.futures library

This module provides a high-level interface for asyn­chron­ous execution and parallel pro­cessing of processes. It uses the Pool Executor to execute tasks on a pool of processes or threads. The concurrent.futures module is a simpler way to process asyn­chron­ous tasks and is in many cases easier to handle than the Python multiprocessing module.

import concurrent.futures
def task(n):
    return n * n
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(task, i) for i in range(1, 6)]
    for future in concurrent.futures.as_completed(futures):
        print(future.result()) # result in random order
python

The code uses the concurrent.futures module to process tasks in parallel with the ProcessPoolExecutor. The function task(n) is passed for numbers from 1 to 5. The as_completed() method waits for the tasks to be completed and outputs the results in any order.

joblib package

joblib is an external Python library designed to simplify parallel pro­cessing in Python, for example, for re­peat­able tasks such as executing functions with different input para­met­ers or working with large amounts of data. The main functions of joblib is the par­al­lel­isa­tion of tasks, the caching of function results and the op­tim­isa­tion of memory and computing resources.

from joblib import Parallel, delayed
def task(n):
    return n * n
results = Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11))
print(results) # Output: Results of the function for numbers from 1 to 10
python

The ex­pres­sion Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11)) initiates the parallel execution of the function task() for the numbers from 1 to 10. Parallel is con­figured with n_jobs=4, meaning up to four parallel jobs can be processed. Calling delayed(task)(i) creates the task to be executed in parallel for each number i in the range from 1 to 10. This means that the task() function is called sim­ul­tan­eously for each of these numbers. The results for numbers 1 through 10 are stored in results and output.

Go to Main Menu