Maintaining Shared Readonly Data in Multiprocessing
Question:
In a Python multiprocessing environment, how to ensure that a sizeable readonly array (e.g., 3 Gb) is shared among multiple processes without creating copies?
Answer:
Utilizing shared memory capabilities provided by the multiprocessing module in conjunction with NumPy allows for efficient sharing of data between processes.
<code class="python">import multiprocessing import ctypes import numpy as np shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10) shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10)</code>
This approach leverages the fact that Linux employs copy-on-write semantics for fork(), ensuring that data is only duplicated when modified. As a result, even without explicitly using the multiprocessing.Array, the data is effectively shared between processes unless altered.
<code class="python"># Parallel processing def my_func(i, def_param=shared_array): shared_array[i,:] = i if __name__ == '__main__': pool = multiprocessing.Pool(processes=4) pool.map(my_func, range(10)) print(shared_array)</code>
This code concurrently modifies the shared array and demonstrates the successful sharing of data among multiple processes:
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.] [ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.] [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.] [ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.] [ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.] [ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.] [ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
By leveraging shared memory and copy-on-write semantics, this approach provides an efficient solution for sharing large amounts of readonly data between processes in a multiprocessing environment.
The above is the detailed content of How to Share Large Readonly Data Efficiently in Python Multiprocessing?. For more information, please follow other related articles on the PHP Chinese website!