Speed test for running large Python Matrix Manipulation

0

Hi,

I have written a python program for large matrix manipulation. It doesn't use any fancy imports, and is straight forward 'for loop' iterations across large matrices.

When I run this on a fairly large matrix it takes around 400 secs on my Macbook-Air M2. So I was hoping to run it on aws server with super fast speed. I tested all varieties : r6, c6, m6, g4 ... but found the performance (in terms of speed) was not even equal to my mac-m2 one but half of it.

MacBook-Air M2 : 8 CPU, 8 GB, took 412 secs r6i.xlarge : 4 CPU, 32 GB, took 709 secs c6a,4xlarge : 16CPU, 32 GB, took 705 secs m6i.4xlarge : 16CPU, 64 GB, took 700 secs g4dn.2xlarge : 1GPU, 8CPU, 32GB, took 995 secs

I am really surprised on the above. I was running this test to see which (computing, memory, general, gpu) would serve me well. But all 4 were disappointing. Am I missing something ?

-- rishi

Rishi
asked 10 months ago263 views
1 Answer
0
Accepted Answer

Without looking at the code it's difficult to say but I suspect that if you dug deep into the details you'd find that the operations are happening on a single CPU core (this is a guess but go with me). This is potentially a Python problem (well known - look for details about the [Python Global Interpreter Lock]) but it might also come down to the type of operations that you're trying to do.

You can confirm this by using something like top to see which CPU cores are being used - if only one then having more of them doesn't help much. Also, many applications require specific code to use a GPU so without that you aren't getting any benefit from having one.

If that is the case (and again, I'm only guessing here) then the next step would be to find a way to run the calculations across multiple cores at the same time. You might run into other bottlenecks there though (generally in terms of inter-process communications and data synchronisation). Welcome to the world of HPC! ;-)

profile pictureAWS
EXPERT
answered 10 months ago
  • Thanks a lot. So does this mean that when I run on my Mac-M2, it automatically uses as much/many cpus as it wants and this is not the case in aws ?

    I get the GPU part, but for r6, c5, m6 this should have been automatic, no ?

    Anyway good to enter HPC : )

  • Again, "it depends". The Python interpreter on the Mac might offload tasks to the GPU. Or it might not. It might be better at multi-threading or it might not. And it also depends on the code itself. Different CPU and mainboard architectures are going to be better or worse at single or multi-threaded tasks. And some tasks can be done more efficiently on some processor types. For example, given that the Mac is running the ARM architecture, you might try using the AWS Graviton instances which have a similar CPU type.

  • Thanks Brettski-AWS. I got it : ) I tried a sample multi-cpu code and it worked. So now I will re-do my python code which can use multiple CPUs and run on the multiple servers... That was very helpful and too the point ! Thanks again

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions