By using AWS re:Post, you agree to the Terms of Use

Distributed processing in AWS


I am trying to create a prototype, which will processes the camera data using C++. The application I have is becoming slow as is struggling with processing the large amount of data I am receiving, even though we have a high-spec PC. As the performance will be an issue in the future as well, we decided to make it as a distributed application so that we can run it on a PC cluster.

After doing some research, we decided to use the cloud services (AWS) instead of setting up the PC cluster by ourselves, to avoid the infrastructure cost.

In order to test, I did a small test application in C++ using MPI, with which I am able to create multiple processes. Now I would like to run the test MPI application in AWS.

I am keeping all the binaries in S3 and copying them to EC2, as part of EC2 Launch configuration (Userdata). Can I do that or is there any simpler way?

Also can I use AWS Autoscaler to create a multiple EC2 instances and run the application on the group (I am still not sure how this will work)? Or Do I need to create a AWS EMR / AWS ParallelCluster?

I am new to both distributed processing & AWS, and still not clear about all the services that I can use in AWS. Could you please suggest whether I am going in the right direction or not?

3 Answers

You might consider creating a custom image using image builder with your software stack.

If each video segment is processed independently, AWS Batch might be easier to use. It also allows you to use GPU instances if your image processing application can use GPU libraries to speed up image processing. You should be able to create an image that supports MPI on a single node.

AWS parallel cluster is more helpful when there is communication between different nodes and allows running of MPI jobs across multiple nodes.

For both of the above check on spot pricing options if video processing does not need to occur on demand. Do also try out the tutorials to see what will suite your use case best.

answered 7 months ago
  • Thank you bakwms.

  • Spot availability on GPU instances can be a bit harder to find than other instance types. If you go this route make sure the application can be size flexible (aka, work on different size GPU instances), and that you use every AZ in the region for maximum possible spot capacity pools. If there's not a ton of data to move around (watch the data egress costs if there is), then you might want to look into a couple regions for better availability.


Hi! this blog post can help you to choose between AWS ParallelCluster and AWS Batch:

In ParallelCluster you can use Slurm as a scheduler and you can customize your cluster by using a Custom Boostrap Action or building a Custom AWS ParallelCluster AMI.

ParallelCluster 3.x has the pcluster build-image command that permits you to build your custom AMI and then use it for your clusters, detailed instructions in the link above.


answered 7 months ago

Typical HPC MPI use shared file system, so that you can put your MPI application in a shared file system, for example EFS(NFS), or FsX for lustre, MPI application is multi-processes model which could run across multiple hosts, When you submit MPI application job, HPC scheduler will allocate hosts resource so you MUST define resource requirement when you submit job, it can not be dynamically scaling when job running. When HPC scheduler allocate host, it will run your first MPI application process in first allocated host, and then MPI will be responsible for starting other processes in this or other hosts allocated, you do not need handle it by your-selves. If the data was saved in S3, you can create a FsX file system to bind this S3's bucket, so the compute node can access the data as POSIX file system and handle it.

answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions