When parallelizing CPU-intensive loops using joblib, you may encounter an issue where all worker processes are assigned to a single core, resulting in no performance gain.
This issue stems from the import of certain Python modules, such as Numpy, Scipy, Pandas, and Sklearn. These modules link against multithreaded OpenBLAS libraries, which can interfere with core affinity.
To resolve this issue, you can reset the task affinity using the following command:
<code class="python">os.system("taskset -p 0xff %d" % os.getpid())</code>
This command resets the affinity of the current process to all available cores. Here's an updated version of your example with the workaround:
<code class="python">from joblib import Parallel, delayed import numpy as np import os def testfunc(data): # some very boneheaded CPU work for nn in xrange(1000): for ii in data[0, :]: for jj in data[1, :]: ii*jj def run(niter=10): data = (np.random.randn(2, 100) for ii in xrange(niter)) pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all') # Reset task affinity os.system("taskset -p 0xff %d" % os.getpid()) results = pool(delayed(testfunc)(dd) for dd in data) if __name__ == '__main__': run()</code>
After applying this workaround, the worker processes should be assigned to different cores, utilizing all available resources for parallelization.
In addition to the workaround, you can also disable OpenBLAS's CPU affinity-resetting behavior using the following methods:
OPENBLAS_MAIN_FREE=1 python myscript.py
NO_AFFINITY=1
The above is the detailed content of Why Does Numpy Interfere with Multiprocessing Core Assignment in Joblib?. For more information, please follow other related articles on the PHP Chinese website!