What is Python multiprocessing and how to use it

Contents

Python multiprocessing lets you divide the workload among multiple processes, cutting down on overall execution time. This is especially useful for making hefty calculations or handling large datasets.

What is Python multiprocessing?

Multiprocessing in Python refers to running multiple processes simultaneously, allowing you to make the most of multicore systems. Unlike single-threaded methods that handle tasks one by one, multiprocessing lets various parts of the program run in parallel, each on its own. Each process gets its own memory space and can run on separate processor cores, slashing execution time for heavy-duty or time-sensitive operations.

Python multiprocessing has a wide range of applications. Multiprocessing is often used in data processing and analysis to process large data sets faster and to accelerate complex analyses. Multiprocessing can also be used in simulations and modelling calculations (e.g., in scientific applications) to shorten the execution times of complex calculations. In addition to powering web scraping by fetching data from multiple sites simultaneously, it also boosts efficiency in image processing and computer vision, resulting in quicker image analysis.

HiDrive Cloud Storage

Store and share your data on the go

Store, share and edit data easily
ISO-certified European data centres
Highly secure and GDPR compliant

How to implement Python multiprocessing

Python offers various options for implementing multiprocessing. In the following sections, we’ll introduce you to three common tools: the multiprocessing module, the concurrent.futures library and the joblib package.

`multiprocessing` module

The multiprocessing module is the standard module for Python multiprocessing. With this module, you can create processes, share data between them and sync them using locks, queues and other tools.

import multiprocessing
def task(n):
    result = n * n
    print(f"Result: {result}")
if __name__ == "__main__":
    processes = []
    for i in range(1, 6):
        process = multiprocessing.Process(target=task, args=(i,))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()

python

In the example above, we use the multiprocessing.Process class to spawn and run processes executing the task() function, which computes the square of a given number. After initialising the processes, we wait for them to complete before proceeding with the main program. The result is displayed using an f-string, a Python string format method that incorporates expressions. It’s worth noting that the sequence of output is random and non-deterministic. You can also create a process pool with Python multiprocessing:

import multiprocessing
def task(n):
    return n * n
if __name__ == "__main__":
    with multiprocessing.Pool() as pool:
        results = pool.map(task, range(1, 6))
        print(results)  # Output: [1, 4, 9, 16, 25]

python

With pool.map() the function task() is applied to a sequence of data, and the results are collected and output.

`concurrent.futures` library

This module provides a high-level interface for asynchronous execution and parallel processing of processes. It uses the Pool Executor to execute tasks on a pool of processes or threads. The concurrent.futures module is a simpler way to process asynchronous tasks and is in many cases easier to handle than the Python multiprocessing module.

import concurrent.futures
def task(n):
    return n * n
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(task, i) for i in range(1, 6)]
    for future in concurrent.futures.as_completed(futures):
        print(future.result()) # result in random order

python

The code uses the concurrent.futures module to process tasks in parallel with the ProcessPoolExecutor. The function task(n) is passed for numbers from 1 to 5. The as_completed() method waits for the tasks to be completed and outputs the results in any order.

`joblib` package

joblib is an external Python library designed to simplify parallel processing in Python, for example, for repeatable tasks such as executing functions with different input parameters or working with large amounts of data. The main functions of joblib is the parallelisation of tasks, the caching of function results and the optimisation of memory and computing resources.

from joblib import Parallel, delayed
def task(n):
    return n * n
results = Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11))
print(results) # Output: Results of the function for numbers from 1 to 10

python

The expression Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11)) initiates the parallel execution of the function task() for the numbers from 1 to 10. Parallel is configured with n_jobs=4, meaning up to four parallel jobs can be processed. Calling delayed(task)(i) creates the task to be executed in parallel for each number i in the range from 1 to 10. This means that the task() function is called simultaneously for each of these numbers. The results for numbers 1 through 10 are stored in results and output.

10 Years Digital Guide: A Success Story

Stay on top of AI!

What is the Python type() function?

The Python type() function is part of the basic scope of the dynamic programming language. Compared to other functions, type() is unusual because it has two very different applications. In our dedicated article, we explain everything you need to know as well as how the Python…

Tutorials
Python

REDPIXEL.PLShutterstock

How to use Python append to extend lists

List management is vital in Python programs, and so it’s no surprise that Python offers various built-in methods to simplify the task. The append method is one such gem, enabling you to effortlessly add an element to the end of a list. In this guide, we’ll break down how Python…

Tutorials
Python

REDPIXEL.PLShutterstock

How to use Python operators

You’re probably already familiar with arithmetic operators from your school days. Plus, minus, multiply, and divide are all symbols for mathematical operations. A programming language like Python knows many more operators. It’s not only numbers which can be processed, but also…

Tutorials
Python

What is Python mul­ti­pro­cessing and how to use it

What is Python mul­ti­pro­cessing?

How to implement Python mul­ti­pro­cessing

multiprocessing module

concurrent.futures library

joblib package

What is Python multiprocessing and how to use it

What is Python multiprocessing?

How to implement Python multiprocessing

`multiprocessing` module

`concurrent.futures` library

`joblib` package