- Newest
- Most votes
- Most comments
Running a multi-node parallel job in AWS Batch using R can be achieved by using a combination of R's parallel processing capabilities and AWS Batch's built-in support for distributed computing.
To containerize your R application code, you can use a Docker image and push it to Amazon Elastic Container Registry (ECR). Once the image is available in ECR, you can use it to create a job definition in AWS Batch.
As for the parallel logic, it can be placed inside the R code. You can use R's built-in parallel processing libraries such as 'parallel' or 'foreach' to split your job into chunks and run them in parallel. For example, you can use the 'foreach' package to define a parallel loop that will run the models for different users in parallel. In this case, you don't need to define the parallel logic in the Dockerfile.
To define how many chunks your job will be split into, you can use the 'registerDoParallel' function to specify the number of parallel workers. You can also set the number of vCPUs allocated for the container in the job definition.
AWS Batch will automatically scale the number of instances based on the number of jobs that are waiting to be executed and the number of available instances. You can also configure the number of instances in the compute environment.
In summary, you can place the parallel logic inside the R code, and use the same image to create the job definition. AWS Batch will automatically split the job into chunks based on the parallel workers specified in the R code and the number of vCPUs allocated for the container.
Relevant content
- asked a year ago
- asked 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
Hi Victor and thank you very much for your reply.
This is exactly how I have written my R code. I am using:
The part that I was missing was "how Batch knows exactly how to chuck/distribute the jobs in different instances". But the answer, based on your answer, mush be: Batch is clever enough to understand this, scale the instances and distribute tasks based on the
foreach-queue
created and the EC2 resources-containers that you have specified in the job definition.I will try to run this Rcode with the same image with multiple EC2 defined in the submission, and fingers crossed this will do the work. I will leave an update.
Just an update: Batch multi-node parallel process works fine in R using the logic and code as described above. It also looks like Batch is clever enough to merge the nodes' separate results into a single object after the job is done. Make sure that your Batch's Subnet, Security group, VPC, Placement group and IAM roles have been set appropriately. Also make sure that your EC2 quota limits are high enough to support your compute requirements. Finally, make sure you set a large enough container memory (and cpu) value so that your nodes don't get killed unexpectedly (exit-code 137).
I am also posting a Dockerfile template that helped me do my containerized work with R in the AWS Batch.
I hope people find this conversation useful.