Hey, I have a 32-core EC2 linux m5 instance. My python installed via anaconda. I notice that my numpy cannot use more than 16 cores.
Looks like my numpy uses libmkl_rt.so:
[2]: np.show_config()
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/ec2-user/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/ec2-user/anaconda3/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/ec2-user/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/ec2-user/anaconda3/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/ec2-user/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/ec2-user/anaconda3/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/ec2-user/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/ec2-user/anaconda3/include']
When I tried to set MKL_NUM_THREADS below 16, it works
(base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=12 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())"
12
When I tried to set it to 24, it stops at 16
(base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=24 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())"
16
But I do have 32 cores
In [2]: os.cpu_count()
Out[2]: 32
Is there any other settings I need to check?
Thanks,
Bill
This is it! Exporting MKL_DYNAMIC=FALSE and setting MKL_NUM_THREADS did the magic for me.