Setting MKL_NUM_THREADS to be more than 16 for m5 instances

Question

Hey, I have a 32-core EC2 linux m5 instance.  My python installed via anaconda.  I notice that my numpy cannot use more than 16 cores.

Looks like my numpy uses libmkl_rt.so:
```
[2]: np.show_config()                                                                                
blas_mkl_info:                                                                                          
    libraries = ['mkl_rt', 'pthread']                                                                   
    library_dirs = ['/home/ec2-user/anaconda3/lib']                                                     
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/anaconda3/include']
blas_opt_info:                                                                                          
    libraries = ['mkl_rt', 'pthread']                                                                   
    library_dirs = ['/home/ec2-user/anaconda3/lib']                                                     
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/anaconda3/include']
lapack_mkl_info:                                                                                        
    libraries = ['mkl_rt', 'pthread']                                                                   
    library_dirs = ['/home/ec2-user/anaconda3/lib']                                                     
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/anaconda3/include']
lapack_opt_info:                                                                                        
    libraries = ['mkl_rt', 'pthread']                                                                   
    library_dirs = ['/home/ec2-user/anaconda3/lib']                                                     
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ec2-user/anaconda3/include']

```

When I tried to set MKL_NUM_THREADS below 16, it works
```
(base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=12 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())"                                             
12
```
When I tried to set it to 24, it stops at 16
```                                                                                                    
(base) ec2-user@ip-172-31-18-3:~$ export MKL_NUM_THREADS=24 && python -c "import ctypes; mkl_rt = ctypes.CDLL('libmkl_rt.so'); print (mkl_rt.mkl_get_max_threads())"                                             
16                                                                                                    
```

But I do have 32 cores
```
In [2]: os.cpu_count()
Out[2]: 32
```

Is there any other settings I need to check?

Thanks,
Bill

Accepted Answer

If you're using an m5.8xlarge that has 32 vCPU but that's with hyperthreading enabled by default; that corresponds to 16 physical cores. I've seen some discussion (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-threading-behavior-on-Hyper-Threading-systems/td-p/896702) that MKL caps out at the number of physical cores when hyperthreading is enabled unless you set `MKL_DYNAMIC=FALSE`

Answer

No luck.  Plan to create a new m5 instance with 32 core and setup mkl there,

Answer

FYI I suspect that the MKL needs to be updated since I updated my m5 instance from 16 core to 32 core.  Not sure if this would help.  I am looking at this link:
[https://docs.anaconda.com/mkl-optimizations/index.html
]()

Setting MKL_NUM_THREADS to be more than 16 for m5 instances

Relevanter Inhalt